z-logo
Premium
Robust principal component analysis and outlier detection with ecological data
Author(s) -
Jackson Donald A.,
Chen Yong
Publication year - 2004
Publication title -
environmetrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.68
H-Index - 58
eISSN - 1099-095X
pISSN - 1180-4009
DOI - 10.1002/env.628
Subject(s) - outlier , univariate , mahalanobis distance , principal component analysis , bivariate analysis , multivariate statistics , robust statistics , statistics , anomaly detection , computer science , population , data mining , mathematics , sociology , demography
Ecological studies frequently involve large numbers of variables and observations, and these are often subject to various errors. If some data are not representative of the study population, they tend to bias the interpretation and conclusion of an ecological study. Because of the multivariate nature of ecological data, it is very difficult to identify atypical observations using approaches such as univariate or bivariate plots. This difficulty calls for the application of robust statistical methods in identifying atypical observations. Our study provides a comparison of a standard method, based on the Mahalanobis distance, used in multivariate approaches to a robust method based on the minimum volume ellipsoid as a means of determining whether data sets contain outliers or not. We evaluate both methods using simulations varying conditions of the data, and show that the minimum volume ellipsoid approach is superior in detecting outliers where present. We show that, as the sample size parameter, h , used in the robust approach increases in value, there is a decrease in the accuracy and precision of the associated estimate of the number of outliers present, in particular as the number of outliers increases. Conversely, where no outliers are present, large values for the parameter provide the most accurate results. In addition to the simulation results, we demonstrate the use of the robust principal component analysis with a data set of lake‐water chemistry variables to illustrate the additional insight available. We suggest that ecologists consider that their data may contain atypical points. Following checks associated with normality, bivariate linearity and other traditional aspects, we advocate that ecologists examine their data sets using robust multivariate methods. Points identified as being atypical should be carefully evaluated based on background information to determine their suitability for inclusion in further multivariate analyses and whether additional factors explain their unusual characteristics. Copyright © 2004 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here