Open Access
Joint Clustering of Single-Cell Sequencing and Fluorescence In Situ Hybridization Data for Reconstructing Clonal Heterogeneity in Cancers
Author(s) -
Xuecong Fu,
Hongwei Lei,
Yifeng Tao,
Kerstin HeselmeyerHaddad,
Irianna Torres,
Michael Dean,
Thomas Ried,
Russell Schwartz
Publication year - 2021
Publication title -
journal of computational biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.585
H-Index - 95
eISSN - 1557-8666
pISSN - 1066-5277
DOI - 10.1089/cmb.2021.0255
Subject(s) - biology , single cell sequencing , somatic evolution in cancer , ploidy , computational biology , comparative genomic hybridization , cluster analysis , phylogenetic tree , fluorescence in situ hybridization , copy number variation , evolutionary biology , genome , genetics , cancer , exome sequencing , chromosome , computer science , mutation , gene , artificial intelligence
Aneuploidy and whole genome duplication (WGD) events are common features of cancers associated with poor outcomes, but the ways they influence trajectories of clonal evolution are poorly understood. Phylogenetic methods for reconstructing clonal evolution from genomic data have proven a powerful tool for understanding how clonal evolution occurs in the process of cancer progression, but extant methods so far have limited the ability to resolve tumor evolution via ploidy changes. This limitation exists in part because single-cell DNA-sequencing (scSeq), which has been crucial to developing detailed profiles of clonal evolution, has difficulty in resolving ploidy changes and WGD. Multiplex interphase fluorescence in situ hybridization (miFISH) provides a more unambiguous signal of single-cell ploidy changes but it is limited to profiling small numbers of single markers. Here, we develop a joint clustering method to combine these two data sources with the goal of better resolving ploidy changes in tumor evolution. We develop a probabilistic framework to maximize the probability of latent variables given the pre-clustered datasets, which we optimize via Markov chain Monte Carlo sampling combined with linear regression. We validate the method by using simulated data derived from a glioblastoma (GBM) case profiled by both scSeq and miFISH. We further apply the method to two GBM cases with scSeq and miFISH data by reconstructing a phylogenetic tree from the joint clustering results, demonstrating their synergistic value in understanding how focal copy number changes and WGD events can collectively contribute to tumor progression.