Premium
Using sufficient direction factor model to analyze latent activities associated with breast cancer survival
Author(s) -
Baek Seungchul,
Ho YenYi,
Ma Yanyuan
Publication year - 2020
Publication title -
biometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.298
H-Index - 130
eISSN - 1541-0420
pISSN - 0006-341X
DOI - 10.1111/biom.13208
Subject(s) - censoring (clinical trials) , breast cancer , context (archaeology) , computer science , correlation , dimension (graph theory) , survival analysis , data mining , computational biology , cancer , econometrics , statistics , biology , mathematics , genetics , paleontology , geometry , pure mathematics
High‐dimensional gene expression data often exhibit intricate correlation patterns as the result of coordinated genetic regulation. In practice, however, it is difficult to directly measure these coordinated underlying activities. Analysis of breast cancer survival data with gene expressions motivates us to use a two‐stage latent factor approach to estimate these unobserved coordinated biological processes. Compared to existing approaches, our proposed procedure has several unique characteristics. In the first stage, an important distinction is that our procedure incorporates prior biological knowledge about gene‐pathway membership into the analysis and explicitly model the effects of genetic pathways on the latent factors. Second, to characterize the molecular heterogeneity of breast cancer, our approach provides estimates specific to each cancer subtype. Finally, our proposed framework incorporates sparsity condition due to the fact that genetic networks are often sparse. In the second stage, we investigate the relationship between latent factor activity levels and survival time with censoring using a general dimension reduction model in the survival analysis context. Combining the factor model and sufficient direction model provides an efficient way of analyzing high‐dimensional data and reveals some interesting relations in the breast cancer gene expression data.