Breaking the curse of dimensionality in quadratic discriminant analysis models with a novel variant of a Bayes classifier enhances automated taxa identification of freshwater macroinvertebrates | Zendy

Ärje J. | Zendy; Kärkkäinen S. | Zendy; Turpeinen T. | Zendy; Meissner K. | Zendy

Premium

Breaking the curse of dimensionality in quadratic discriminant analysis models with a novel variant of a Bayes classifier enhances automated taxa identification of freshwater macroinvertebrates

Author(s) -

Ärje J.,

Kärkkäinen S.,

Turpeinen T.,

Meissner K.

Publication year - 2013

Publication title -

environmetrics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.68

H-Index - 58

eISSN - 1099-095X

pISSN - 1180-4009

DOI - 10.1002/env.2208

Subject(s) - linear discriminant analysis , bayes' theorem , artificial intelligence , curse of dimensionality , naive bayes classifier , pattern recognition (psychology) , context (archaeology) , random forest , quadratic classifier , identification (biology) , computer science , bayesian probability , classifier (uml) , statistics , mathematics , machine learning , ecology , biology , support vector machine , paleontology

Macroinvertebrate samples are commonly used in biomonitoring to study changes on aquatic ecosystems. Traditionally, specimens are identified manually to taxa by human experts being time‐consuming and cost intensive. Using the image data of 35 taxa and 64 features, we propose a novel variant of the quadratic discriminant analysis for breaking the curse of dimensionality in quadratic discriminant analysis models. Our variant, called a random Bayes array (RBA), uses bagging and random feature selection similar to random forest. We explore several variations of RBA. We consider three classification (i.e taxa identification) decisions: majority vote, averaged posterior probabilities, and a novel approach; a score of weighted votes. Besides modifying the voting, we propose to weight features according to their importance instead of eliminating the least important features. We compared the performance of RBA with traditional Bayesian and several other popular classification methods and assessed how the methods behave in relation to each other and the different macroinvertebrate species. Further, we investigate how severely misclassifications affect the performance of different methods when set into a biomonitoring context. We found that the lowest and least severe classification error (i.e. most accurate taxa identification) was achieved with RBA by using averaged posterior probabilities and weighted features. Copyright © 2013 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research