
Improving the estimation of educational attainment: New methods for assessing average years of schooling from binned data
Author(s) -
Joseph Friedman,
Nicholas Graetz,
Emmanuela Gakidou
Publication year - 2018
Publication title -
plos one
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.99
H-Index - 332
ISSN - 1932-6203
DOI - 10.1371/journal.pone.0208019
Subject(s) - statistics , standard error , educational attainment , estimation , mean squared error , metric (unit) , econometrics , standard deviation , population , duration (music) , observational error , census , contrast (vision) , demography , mathematics , computer science , economics , sociology , art , operations management , management , literature , artificial intelligence , economic growth
Background The accurate measurement of educational attainment is of great importance for population research. Past studies measuring average years of schooling rely on strong assumptions to incorporate binned data. These assumptions, which we refer to as the standard duration method, have not been previously evaluated for bias or accuracy. Methods We assembled a database of 1,680 survey and census datasets, representing both binned and single-year education data. We developed two models that split bins of education into single year values. We evaluate our models, and compare them to the standard duration method, using out-of-sample predictive validity. Results Our results indicate that typical methods used to split bins of educational attainment introduce substantial error and bias into estimates of average years of schooling, as compared to new approaches. Globally, the standard duration method underestimates average years of schooling, with a median error of -0.47 years. This effect is especially pronounced in datasets with a smaller number of bins or higher true average attainment, leading to irregular error patterns between geographies and time periods. Both models we developed resulted in unbiased predictions of average years of schooling, with smaller average error than previous methods. We find that one approach using a metric of distance in space and time to identify training data, had the best performance, with a root mean squared error of mean attainment of 0.26 years, compared to 0.92 years for the standard duration algorithm. Conclusions Education is a key social indicator and its accurate estimation should be a population research priority. The use of a space-time distance bin-splitting model drastically improved the estimation of average years of schooling from binned education data. We provide a detailed description of how to use the method and recommend that future studies estimating educational attainment across time or geographies use a similar approach.