Premium
Geometric consistency of principal component scores for high‐dimensional mixture models and its application
Author(s) -
Yata Kazuyoshi,
Aoshima Makoto
Publication year - 2020
Publication title -
scandinavian journal of statistics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.359
H-Index - 65
eISSN - 1467-9469
pISSN - 0303-6898
DOI - 10.1111/sjos.12432
Subject(s) - principal component analysis , mathematics , cluster analysis , consistency (knowledge bases) , context (archaeology) , representation (politics) , dimension (graph theory) , pattern recognition (psychology) , artificial intelligence , statistics , computer science , combinatorics , geometry , law , biology , paleontology , politics , political science
In this article, we consider clustering based on principal component analysis (PCA) for high‐dimensional mixture models. We present theoretical reasons why PCA is effective for clustering high‐dimensional data. First, we derive a geometric representation of high‐dimension, low‐sample‐size (HDLSS) data taken from a two‐class mixture model. With the help of the geometric representation, we give geometric consistency properties of sample principal component scores in the HDLSS context. We develop ideas of the geometric representation and provide geometric consistency properties for multiclass mixture models. We show that PCA can cluster HDLSS data under certain conditions in a surprisingly explicit way. Finally, we demonstrate the performance of the clustering using gene expression datasets.