Unified methods for feature selection in large-scale genomic studies with censored survival outcomes | Zendy

Lauren SpirkoBurns | Zendy; Karthik Devarajan | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Unified methods for feature selection in large-scale genomic studies with censored survival outcomes

Author(s) -

Lauren SpirkoBurns,

Karthik Devarajan

Publication year - 2020

Publication title -

bioinformatics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.599

H-Index - 390

eISSN - 1367-4811

pISSN - 1367-4803

DOI - 10.1093/bioinformatics/btaa161

Subject(s) - univariate , feature selection , computer science , event (particle physics) , proportional hazards model , ranking (information retrieval) , feature (linguistics) , data mining , scale (ratio) , odds , selection (genetic algorithm) , computational biology , statistics , machine learning , mathematics , biology , multivariate statistics , logistic regression , linguistics , philosophy , physics , quantum mechanics

One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous datasets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards (PH), which is unlikely to hold for each feature. When applied to genomic features exhibiting some form of non-proportional hazards (NPH), these methods could lead to an under- or over-estimation of the effects. We propose a broad array of marginal screening techniques that aid in feature ranking and selection by accommodating various forms of NPH. First, we develop an approach based on Kullback-Leibler information divergence and the Yang-Prentice model that includes methods for the PH and proportional odds (PO) models as special cases. Next, we propose R2 measures for the PH and PO models that can be interpreted in terms of explained randomness. Lastly, we propose a generalized pseudo-R2 index that includes PH, PO, crossing hazards and crossing odds models as special cases and can be interpreted as the percentage of separability between subjects experiencing the event and not experiencing the event according to feature measurements.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research