z-logo
Premium
COMBINING MULTIPLE MACHINE LEARNING ALGORITHMS TO PREDICT TAXA UNDER REFERENCE CONDITIONS FOR STREAMS BIOASSESSMENT
Author(s) -
Feio M. J.,
VianaFerreira C.,
Costa C.
Publication year - 2014
Publication title -
river research and applications
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.679
H-Index - 94
eISSN - 1535-1467
pISSN - 1535-1459
DOI - 10.1002/rra.2707
Subject(s) - support vector machine , taxon , invertebrate , computer science , machine learning , artificial intelligence , perceptron , streams , data mining , pattern recognition (psychology) , artificial neural network , ecology , biology , computer network
In the present study, we tested the potential of combining three machine learning techniques in a bioassessment tool to more accurately predict the pool of expected taxa at a site. This tool, the Hydra, uses the best performing technique from Support Vector Machines (SVM), Multi‐layer Perceptron and K ‐Nearest Neighbour (KNN), to predict the taxa expected at a stream site, and further evaluates the quality of a site, though a classification system based on observed/expected values, similar to that used in River Invertebrate Prediction and Classification System (RIVPACS) models. To test the procedure, we used a dataset composed of 137 training sites, 15 validation sites and 174 test sites (potentially disturbed) from Portuguese streams. The combined use of three machine learning techniques was more effective in the prediction of invertebrate taxa at a site than their individual use. The three methods were always tested for all invertebrate taxa, but from the three techniques tested, SVM and KNN were most often the best performing techniques (the most accurate among the three for a higher number of taxa) in the prediction of invertebrate taxa with the present dataset. The combination of all algorithms implemented in Hydra resulted in good models for stream bioassessment (e.g. SD OE 50  < 0.2, regression of O vs E: R 2  > 0.6, Spearman correlations with global degradation >0.7). We also found no advantage in removing rare taxa from the training dataset, and 50% accuracy is the most adequate accuracy level for calculation of OE ratios through Hydra. Future work should consist of comparing the performance of this technique with others, such as the RIVPACS models, using the same datasets. Considering the flexibility of this technique, self‐adjustment and easy implementation through a website (aquaweb.uc.pt), we expect it to be also useful in the prediction of other aquatic elements such as fishes and algae. Copyright © 2013 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here