z-logo
Premium
On Using Principal Components before Separating a Mixture of Two Multivariate Normal Distributions
Author(s) -
Chang WeiChien
Publication year - 1983
Publication title -
journal of the royal statistical society: series c (applied statistics)
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.205
H-Index - 72
eISSN - 1467-9876
pISSN - 0035-9254
DOI - 10.2307/2347949
Subject(s) - multivariate statistics , principal component analysis , multivariate normal distribution , statistics , mathematics , multivariate analysis , multivariate analysis of variance
SUMMARY In applying principal components for reducing the dimension of the data before clustering, it has ordinarily been the practice to use components with the largest eigenvalues. We prove, by means of a mixture of two multivariate normal distributions, that this practice is not justified in general. A relationship between the distance of the two sub populations and any subset of principal components is derived, showing that the components with the larger eigenvalues do not necessarily contain more information (distance). This result is further demonstrated through hypothetical as well as real situations which use actual data. The effect of scaling the variables on the distribution of the information to different components is investigated. An application to a mixture of two normal distributions is illustrated by utilizing a set of generated data in which the information is concentrated in the components with the largest and the smallest eigenvalues.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here