z-logo
open-access-imgOpen Access
Deep soft K-means clustering with self-training for single-cell RNA sequence data
Author(s) -
Liang Chen,
Weinan Wang,
Yuyao Zhai,
Minghua Deng
Publication year - 2020
Publication title -
nar genomics and bioinformatics
Language(s) - English
Resource type - Journals
ISSN - 2631-9268
DOI - 10.1093/nargab/lqaa039
Subject(s) - cluster analysis , computer science , autoencoder , dimensionality reduction , scalability , artificial intelligence , population , data mining , pattern recognition (psychology) , robustness (evolution) , deep learning , biology , biochemistry , demography , database , sociology , gene
Single-cell RNA sequencing (scRNA-seq) allows researchers to study cell heterogeneity at the cellular level. A crucial step in analyzing scRNA-seq data is to cluster cells into subpopulations to facilitate subsequent downstream analysis. However, frequent dropout events and increasing size of scRNA-seq data make clustering such high-dimensional, sparse and massive transcriptional expression profiles challenging. Although some existing deep learning-based clustering algorithms for single cells combine dimensionality reduction with clustering, they either ignore the distance and affinity constraints between similar cells or make some additional latent space assumptions like mixture Gaussian distribution, failing to learn cluster-friendly low-dimensional space. Therefore, in this paper, we combine the deep learning technique with the use of a denoising autoencoder to characterize scRNA-seq data while propose a soft self-training K-means algorithm to cluster the cell population in the learned latent space. The self-training procedure can effectively aggregate the similar cells and pursue more cluster-friendly latent space. Our method, called ‘scziDesk’, alternately performs data compression, data reconstruction and soft clustering iteratively, and the results exhibit excellent compatibility and robustness in both simulated and real data. Moreover, our proposed method has perfect scalability in line with cell size on large-scale datasets.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom