Premium
Bump hunting by topological data analysis
Author(s) -
Sommerfeld Max,
Heo Giseon,
Kim Peter,
Rush Stephen T.,
Marron J. S.
Publication year - 2017
Publication title -
stat
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.61
H-Index - 18
ISSN - 2049-1573
DOI - 10.1002/sta4.167
Subject(s) - statistical inference , persistent homology , topological data analysis , inference , kernel density estimation , computer science , kernel (algebra) , data set , statistical hypothesis testing , statistical analysis , algorithm , mathematics , data mining , topology (electrical circuits) , statistics , discrete mathematics , artificial intelligence , combinatorics , estimator
A topological data analysis approach is taken to the challenging problem of finding and validating the statistical significance of local modes in a data set. As with the SIgnificance of the ZERo (SiZer) approach to this problem, statistical inference is performed in a multi‐scale way, that is, across bandwidths. The key contribution is a two‐parameter approach to the persistent homology representation. For each kernel bandwidth, a sub‐level set filtration of the resulting kernel density estimate is computed. Inference based on the resulting persistence diagram indicates statistical significance of modes. It is seen through a simulated example, and by analysis of the famous Hidalgo stamps data, that the new method has more statistical power for finding bumps than SiZer. Copyright © 2017 John Wiley & Sons, Ltd.