Premium
Bayesian adaptive nearest neighbor
Author(s) -
Guo Ruixin,
Chakraborty Sounak
Publication year - 2010
Publication title -
statistical analysis and data mining: the asa data science journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.381
H-Index - 33
eISSN - 1932-1872
pISSN - 1932-1864
DOI - 10.1002/sam.10067
Subject(s) - k nearest neighbors algorithm , computer science , nearest neighbour algorithm , markov chain monte carlo , benchmark (surveying) , probabilistic logic , bayesian probability , pattern recognition (psychology) , artificial intelligence , data mining , posterior probability , algorithm , geography , travelling salesman problem , geodesy
The k nearest neighbor classification ( k ‐NN) is a very simple and popular method for classification. However, it suffers from a major drawback, it assumes constant local class posterior probability. It is also highly dependent on and sensitive to the choice of the number of neighbors k . In addition, it severely lacks the desired probabilistic formulation. In this article, we propose a Bayesian adaptive nearest neighbor method (BANN) that can adaptively select the shape of the neighborhood and the number of neighbors k . The shape of the neighborhood is automatically selected according to the concentration of the data around each query point with the help of discriminants. The neighborhood size is not predetermined and is kept free using a prior distribution. Thus, we are able to make the model to select the appropriate neighborhood size. The model is fitted using Markov Chain Monte Carlo (MCMC), so we are not using exactly one neighborhood size but a mixture of k . Our BANN model is highly flexible, determining any local pattern in the data‐generating process, and adapting it to give an improved prediction. We have applied our model on four simulated data sets with special structures and five real‐life benchmark data sets. Our proposed BANN method demonstrates substantial improvement over k ‐NN and discriminant adaptive nearest neighbor (DANN) in all nine case studies. It also outperforms the probabilistic nearest neighbor (PNN) in most of the data analyses. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 92‐105, 2010