
An experiment on selecting most informative variables in socio-economic data
Author(s) -
Larry Jenkins
Publication year - 2014
Publication title -
orion/orion
Language(s) - English
Resource type - Journals
eISSN - 2224-0004
pISSN - 0259-191X
DOI - 10.5784/19-0-181
Subject(s) - variables , variable (mathematics) , statistics , variance (accounting) , principal (computer security) , econometrics , principal component analysis , mathematics , explained variation , measure (data warehouse) , computer science , data mining , economics , mathematical analysis , accounting , operating system
In many studies where data are collected on several variables, there is a motivation to find if fewer variables would provide almost as much information. Variance of a variable about its mean is the common statistical measure of information content, and that is used here. We are interested whether the variability in one variable is sufficiently correlated with that in one or more of the other variables that the first variable is redundant. We wish to find one or more ‘principal variables’ that sufficiently reflect the information content in all the original variables. The paper explains the method of principal variables and reports experiments using the technique to see if just a few variables are sufficient to reflect the information in 11 socioeconomic variables on 130 countries from a World Bank (WB) database. While the method of principal variables is highly successful in a statistical sense, the WB data varies greatly from year to year, demonstrating that fewer variables wo uld be inadequate for this data