z-logo
Premium
Tailoring sparse multivariable regression techniques for prognostic single‐nucleotide polymorphism signatures
Author(s) -
Binder H.,
Benner A.,
Bullinger L.,
Schumacher M.
Publication year - 2012
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.5490
Subject(s) - covariate , computer science , single nucleotide polymorphism , snp , regression , minor allele frequency , resampling , data mining , statistics , computational biology , artificial intelligence , machine learning , mathematics , biology , genetics , genotype , gene
When seeking prognostic information for patients, modern technologies provide a huge amount of genomic measurements as a starting point. For single‐nucleotide polymorphisms (SNPs), there may be more than one million covariates that need to be simultaneously considered with respect to a clinical endpoint. Although the underlying biological problem cannot be solved on the basis of clinical cohorts of only modest size, some important SNPs might still be identified. Sparse multivariable regression techniques have recently become available for automatically identifying prognostic molecular signatures that comprise relatively few covariates and provide reasonable prediction performance. For illustrating how such approaches can be adapted to the specific features of SNP data, we propose different variants of a componentwise likelihood‐based boosting approach. The latter links SNP measurements to a time‐to‐event endpoint by a regression model that is built up in a large number of steps. The variants allow for strategic choices in dealing with SNPs that differ in variance because of their variation in minor allele frequencies. In addition, we propose a heuristic that allows computationally efficient handling of millions of covariates, thus opening the door for incorporating SNP × treatment interactions. We illustrate this using data from patients with acute myeloid leukemia. We judge the resulting models according to prediction error curves and using resampling data sets. We obtain increased stability by moving interpretation from the SNP to the gene level. By considering these different aspects, we outline a more general strategy for linking SNP measurements to a time‐to‐event endpoint by means of sparse multivariable regression models. Copyright © 2012 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here