Premium
Machine learning in genome‐wide association studies
Author(s) -
Szymczak Silke,
Biernacka Joanna M.,
Cordell Heather J.,
GonzálezRecio Oscar,
König Inke R.,
Zhang Heping,
Sun Yan V.
Publication year - 2009
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.20473
Subject(s) - single nucleotide polymorphism , snp , genome wide association study , genetic association , genetic architecture , computer science , computational biology , tag snp , selection (genetic algorithm) , genome , regression , feature selection , machine learning , biology , genetics , quantitative trait locus , genotype , statistics , gene , mathematics
Recently, genome‐wide association studies have substantially expanded our knowledge about genetic variants that influence the susceptibility to complex diseases. Although standard statistical tests for each single‐nucleotide polymorphism (SNP) separately are able to capture main genetic effects, different approaches are necessary to identify SNPs that influence disease risk jointly or in complex interactions. Experimental and simulated genome‐wide SNP data provided by the Genetic Analysis Workshop 16 afforded an opportunity to analyze the applicability and benefit of several machine learning methods. Penalized regression, ensemble methods, and network analyses resulted in several new findings while known and simulated genetic risk variants were also identified. In conclusion, machine learning approaches are promising complements to standard single‐and multi‐SNP analysis methods for understanding the overall genetic architecture of complex human diseases. However, because they are not optimized for genome‐wide SNP data, improved implementations and new variable selection procedures are required. Genet. Epidemiol . 33 (Suppl. 1):S51–S57, 2009. © 2009 Wiley‐Liss, Inc.