Supervisedt-Distributed Stochastic Neighbor Embedding for Data Visualization and Classification
Author(s) -
Yichen Cheng,
Xinlei Wang,
Yusen Xia
Publication year - 2020
Publication title -
informs journal on computing
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.403
H-Index - 80
eISSN - 1526-5528
pISSN - 1091-9856
DOI - 10.1287/ijoc.2020.0961
Subject(s) - embedding , dimensionality reduction , visualization , computer science , dimension (graph theory) , divergence (linguistics) , pattern recognition (psychology) , reduction (mathematics) , data mining , feature (linguistics) , clustering high dimensional data , artificial intelligence , kullback–leibler divergence , cluster analysis , mathematics , linguistics , philosophy , geometry , pure mathematics
We propose a novel supervised dimension reduction method, called supervised t-distributed stochastic neighbor embedding (St-SNE), which achieves dimension reduction by preserving the similarities of data points in both feature and outcome spaces. The proposed method can be used for both prediction and visualization tasks, with the ability to handle high-dimensional data. We show through a variety of datasets that when compared with a comprehensive list of existing methods, St-SNE has superior prediction performance in the ultra-high dimensional setting where the number of features p exceeds the sample size n , and has competitive performance in the p ≤ n setting. We also show that St-SNE is a competitive visualization tool that is capable of capturing within cluster variations. In addition, we propose a penalized Kullback-Leibler divergence criterion to automatically select the reduced dimension size k for St-SNE.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom