z-logo
open-access-imgOpen Access
Omitting correlated variables
Author(s) -
Larry Jenkins,
Murray Anderson
Publication year - 2014
Publication title -
orion/orion
Language(s) - English
Resource type - Journals
eISSN - 2224-0004
pISSN - 0259-191X
DOI - 10.5784/18-0-183
Subject(s) - principal component analysis , variance (accounting) , variable (mathematics) , covariance matrix , covariance and correlation , variables , computer science , design matrix , principal (computer security) , statistics , covariance , explained variation , matrix (chemical analysis) , data matrix , measure (data warehouse) , conditioning , mathematics , data mining , regression analysis , random variable , sum of normally distributed random variables , materials science , convergence of random variables , business , mathematical analysis , chemistry , composite material , operating system , biochemistry , accounting , clade , gene , phylogenetic tree
Data collected on the physical, biological or man-made world are often highly correlated, posing the question of whether fewer variables would contain almost as much information. A crude solution is simply to look at the Pearson correlation matrix and omit one of a pair of highly correlated variables. A more systematic method is to condition on one or more variables, and observe the resulting partial covariance matrix. If the variables have little variance after the conditioning, then the conditioning variables contain most of the information of all the original variables. Paralleling the usual tests applied in judging how many principal components are sufficient to represent all the data, we can use the amount of variance explained by the conditioning variable (s), as a measure of information content. The paper references earlier work in this area, explains the computation and includes examples using published data sets. The approach is found to be highly competitive with using principal components, and has the obvious advantage over principal components of simply omitting some of the original variables from further consideration. The methodhas been coded in Visual-Basic add-ins to an Excel spreadsheet

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here