Premium
Parallel Analysis: a method for determining significant principal components
Author(s) -
Franklin Scott B.,
Gibson David J.,
Robertson Philip A.,
Pohlmann John T.,
Fralish James S.
Publication year - 1995
Publication title -
journal of vegetation science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.1
H-Index - 115
eISSN - 1654-1103
pISSN - 1100-9233
DOI - 10.2307/3236261
Subject(s) - principal component analysis , spurious relationship , eigenvalues and eigenvectors , covariance matrix , dimensionality reduction , mathematics , statistics , data set , matrix (chemical analysis) , curse of dimensionality , set (abstract data type) , random matrix , sparse pca , pattern recognition (psychology) , computer science , artificial intelligence , chemistry , physics , chromatography , quantum mechanics , programming language
Abstract. Numerous ecological studies use Principal Components Analysis (PCA) for exploratory analysis and data reduction. Determination of the number of components to retain is the most crucial problem confronting the researcher when using PCA. An incorrect choice may lead to the underextraction of components, but commonly results in overextraction. Of several methods proposed to determine the significance of principal components, Parallel Analysis (PA) has proven consistently accurate in determining the threshold for significant components, variable loadings, and analytical statistics when decomposing a correlation matrix. In this procedure, eigenvalues from a data set prior to rotation are compared with those from a matrix of random values of the same dimensionality ( p variables and n samples). PCA eigenvalues from the data greater than PA eigenvalues from the corresponding random data can be retained. All components with eigenvalues below this threshold value should be considered spurious. We illustrate Parallel Analysis on an environmental data set. We reviewed all articles utilizing PCA or Factor Analysis (FA) from 1987 to 1993 from Ecology , Ecological Monographs , Journal of Vegetation Science and Journal of Ecology . Analyses were first separated into those PCA which decomposed a correlation matrix and those PCA which decomposed a covariance matrix. Parallel Analysis (PA) was applied for each PCA/FA found in the literature. Of 39 analy ses (in 22 articles), 29 (74.4 %) considered no threshold rule, presumably retaining interpretable components. According to the PA results, 26 (66.7 %) overextracted components. This overextraction may have resulted in potentially misleading interpretation of spurious components. It is suggested that the routine use of PA in multivariate ordination will increase confidence in the results and reduce the subjective interpretation of supposedly objective methods.