Premium
Comparison of approaches for machine‐learning optimization of neural networks for detecting gene‐gene interactions in genetic epidemiology
Author(s) -
MotsingerReif Alison A.,
Dudek Scott M.,
Hahn Lance W.,
Ritchie Marylyn D.
Publication year - 2008
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.20307
Subject(s) - epistasis , artificial neural network , genetic programming , computer science , artificial intelligence , machine learning , scalability , set (abstract data type) , genetic architecture , grammatical evolution , single nucleotide polymorphism , computational biology , biology , gene , genetics , genotype , quantitative trait locus , database , programming language
The detection of genotypes that predict common, complex disease is a challenge for human geneticists. The phenomenon of epistasis, or gene‐gene interactions, is particularly problematic for traditional statistical techniques. Additionally, the explosion of genetic information makes exhaustive searches of multilocus combinations computationally infeasible. To address these challenges, neural networks (NN), a pattern recognition method, have been used. One limitation of the NN approach is that its success is dependent on the architecture of the network. To solve this, machine‐learning approaches have been suggested to evolve the best NN architecture for a particular data set. In this study we provide a detailed technical description of the use of grammatical evolution to optimize neural networks (GENN) for use in genetic association studies. We compare the performance of GENN to that of a previous machine‐learning NN application—genetic programming neural networks in both simulated and real data. We show that GENN greatly outperforms genetic programming neural networks in data sets with a large number of single nucleotide polymorphisms. Additionally, we demonstrate that GENN has high power to detect disease‐risk loci in a range of high‐order epistatic models. Finally, we demonstrate the scalability of the GENN method with increasing numbers of variables—as many as 500,000 single nucleotide polymorphisms. Genet. Epidemiol . 2008. © 2008 Wiley‐Liss, Inc.