z-logo
Premium
Breaking the curse of dimensionality in quadratic discriminant analysis models with a novel variant of a Bayes classifier enhances automated taxa identification of freshwater macroinvertebrates
Author(s) -
Ärje J.,
Kärkkäinen S.,
Turpeinen T.,
Meissner K.
Publication year - 2013
Publication title -
environmetrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.68
H-Index - 58
eISSN - 1099-095X
pISSN - 1180-4009
DOI - 10.1002/env.2208
Subject(s) - linear discriminant analysis , bayes' theorem , artificial intelligence , curse of dimensionality , naive bayes classifier , pattern recognition (psychology) , context (archaeology) , random forest , quadratic classifier , identification (biology) , computer science , bayesian probability , classifier (uml) , statistics , mathematics , machine learning , ecology , biology , support vector machine , paleontology
Macroinvertebrate samples are commonly used in biomonitoring to study changes on aquatic ecosystems. Traditionally, specimens are identified manually to taxa by human experts being time‐consuming and cost intensive. Using the image data of 35 taxa and 64 features, we propose a novel variant of the quadratic discriminant analysis for breaking the curse of dimensionality in quadratic discriminant analysis models. Our variant, called a random Bayes array (RBA), uses bagging and random feature selection similar to random forest. We explore several variations of RBA. We consider three classification (i.e taxa identification) decisions: majority vote, averaged posterior probabilities, and a novel approach; a score of weighted votes. Besides modifying the voting, we propose to weight features according to their importance instead of eliminating the least important features. We compared the performance of RBA with traditional Bayesian and several other popular classification methods and assessed how the methods behave in relation to each other and the different macroinvertebrate species. Further, we investigate how severely misclassifications affect the performance of different methods when set into a biomonitoring context. We found that the lowest and least severe classification error (i.e. most accurate taxa identification) was achieved with RBA by using averaged posterior probabilities and weighted features. Copyright © 2013 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here