How bandwidth selection algorithms impact exploratory data analysis using kernel density estimation.
Author(s) -
Jared K. Harpole,
Carol M. Woods,
Thomas L. Rodebaugh,
Cheri A. Levinson,
Eric J. Lenze
Publication year - 2014
Publication title -
psychological methods
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 6.981
H-Index - 151
eISSN - 1939-1463
pISSN - 1082-989X
DOI - 10.1037/a0036850
Subject(s) - estimator , kernel density estimation , smoothing , computer science , rule of thumb , multivariate kernel density estimation , sample size determination , density estimation , bandwidth (computing) , statistics , algorithm , kernel (algebra) , cross validation , variable kernel density estimation , mathematics , kernel method , artificial intelligence , support vector machine , computer network , combinatorics
Exploratory data analysis (EDA) can reveal important features of underlying distributions, and these features often have an impact on inferences and conclusions drawn from data. Graphical analysis is central to EDA, and graphical representations of distributions often benefit from smoothing. A viable method of estimating and graphing the underlying density in EDA is kernel density estimation (KDE). This article provides an introduction to KDE and examines alternative methods for specifying the smoothing bandwidth in terms of their ability to recover the true density. We also illustrate the comparison and use of KDE methods with 2 empirical examples. Simulations were carried out in which we compared 8 bandwidth selection methods (Sheather-Jones plug-in [SJDP], normal rule of thumb, Silverman's rule of thumb, least squares cross-validation, biased cross-validation, and 3 adaptive kernel estimators) using 5 true density shapes (standard normal, positively skewed, bimodal, skewed bimodal, and standard lognormal) and 9 sample sizes (15, 25, 50, 75, 100, 250, 500, 1,000, 2,000). Results indicate that, overall, SJDP outperformed all methods. However, for smaller sample sizes (25 to 100) either biased cross-validation or Silverman's rule of thumb was recommended, and for larger sample sizes the adaptive kernel estimator with SJDP was recommended. Information is provided about implementing the recommendations in the R computing language.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom