
Application of HDBSСAN Method for Clustering scRNA-seq Data
Author(s) -
Maria Andreevna Akimenkova,
Anna A. Maznina,
Anton Yurievich Naumov,
Evgeny Karpulevich
Publication year - 2020
Publication title -
trudy instituta sistemnogo programmirovaniâ ran/trudy instituta sistemnogo programmirovaniâ
Language(s) - English
Resource type - Journals
eISSN - 2220-6426
pISSN - 2079-8156
DOI - 10.15514/ispras-2020-32(5)-8
Subject(s) - cluster analysis , hierarchical clustering , computer science , data mining , preprocessor , imputation (statistics) , single linkage clustering , fuzzy clustering , consensus clustering , feature selection , clustering high dimensional data , artificial intelligence , pattern recognition (psychology) , cure data clustering algorithm , machine learning , missing data
One of the main tasks in the analysis of single cell RNA sequencing (scRNA-seq) data is the identification of cell types and subtypes, which is usually based on some method of clustering. There is a number of generally accepted approaches to solving the clustering problem, one of which is implemented in the Seurat package. In addition, the quality of clustering is influenced by the use of preprocessing algorithms, such as imputation, dimensionality reduction, feature selection, etc. In the article, the HDBSCAN hierarchical clustering method is used to cluster scRNA-seq data. For a more complete comparison Experiments and comparisons were made on two labeled datasets: Zeisel (3005 cells) and Romanov (2881 cells). To compare the quality of clustering, two external metrics were used: Adjusted Rand index and V-measure. The experiments demonstrated a higher quality of clustering by the HDBSCAN method on the Zeisel dataset and a poorer quality on the Romanov dataset.