Premium
Dealing with high dimensionality for the identification of common and rare variants as main effects and for gene‐environment interaction
Author(s) -
Bickeböller Heike,
HouwingDuistermaat Jeanine J.,
Wang Xuefeng,
Yan Xiting
Publication year - 2011
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.20647
Subject(s) - multifactor dimensionality reduction , dimensionality reduction , computer science , bayes' theorem , kernel (algebra) , kernel method , identification (biology) , curse of dimensionality , context (archaeology) , computational biology , statistic , population , artificial intelligence , machine learning , data mining , bayesian probability , biology , genetics , single nucleotide polymorphism , mathematics , statistics , support vector machine , gene , genotype , medicine , paleontology , botany , environmental health , combinatorics
In addition to genome‐wide association studies, sequence data are now up and coming, increasing the need for even more effective methods of dealing with high dimensionality and the identification of variants beyond common variant main effects. The contributors to Genetic Analysis Workshop 17 Group 4 applied novel and recently proposed methods for handling population structure, high dimensionality, and gene‐environment interactions in the context of mini‐exome sequence data. For the collapsing of rare variants into gene summaries, some of the contributions considered the computationally fast, straightforward summing of all or particular subsets of rare variants. Other methods were comparatively time‐consuming and complex but offered a data‐driven approach, such as reduction in the subset of rare variants to be considered using a U statistic and semiparametric modeling of single‐nucleotide polymorphism effects implementing kernel machines. Several approaches were applied using regression models, regularized regression, and kernels. Testing for gene‐specific main effects and gene‐environment interaction using least‐squares kernel machines showed more flexibility and was supervised compared with a two‐step approach that used a random effects model that incorporated an empirical Bayes estimate. However, the random effects model was the only method capable of treating family data, at least in their present form. Genet. Epidemiol . 35:S35–S40, 2011. © 2011 Wiley Periodicals, Inc.