High-Dimensional Cluster Analysis with the Masked EM Algorithm
Author(s) -
Shabnam Kadir,
Dan F. M. Goodman,
Kenneth D. Harris
Publication year - 2014
Publication title -
neural computation
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.235
H-Index - 169
eISSN - 1530-888X
pISSN - 0899-7667
DOI - 10.1162/neco_a_00661
Subject(s) - overfitting , cluster analysis , computer science , curse of dimensionality , pattern recognition (psychology) , feature selection , algorithm , spike sorting , clustering high dimensional data , feature vector , generalization , artificial intelligence , sorting , dimensionality reduction , data mining , artificial neural network , mathematics , mathematical analysis
Cluster analysis faces two problems in high dimensions: the "curse of dimensionality" that can lead to overfitting and poor generalization performance and the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of spike sorting for next-generation, high-channel-count neural probes. In this problem, only a small subset of features provides information about the cluster membership of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a "masked EM" algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data and to real-world high-channel-count spike sorting data.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom