Premium
Data Mining and Computationally intensive methods: Summary of Group 7 contributions to Genetic Analysis Workshop 13
Author(s) -
Costello Tracy J.,
Falk Catherine T.,
Ye Kenny Q.
Publication year - 2003
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.10285
Subject(s) - group (periodic table) , computational biology , computer science , data mining , statistics , biology , mathematics , chemistry , organic chemistry
The Framingham Heart Study data, as well as a related simulated data set, were generously provided to the participants of the Genetic Analysis Workshop 13 in order that newly developed and emerging statistical methodologies could be tested on that well‐characterized data set. The impetus driving the development of novel methods is to elucidate the contributions of genes, environment, and interactions between and among them, as well as to allow comparison between and validation of methods. The seven papers that comprise this group used data‐mining methodologies (tree‐based methods, neural networks, discriminant analysis, and Bayesian variable selection) in an attempt to identify the underlying genetics of cardiovascular disease and related traits in the presence of environmental and genetic covariates. Data‐mining strategies are gaining popularity because they are extremely flexible and may have greater efficiency and potential in identifying the factors involved in complex disorders. While the methods grouped together here constitute a diverse collection, some papers asked similar questions with very different methods, while others used the same underlying methodology to ask very different questions. This paper briefly describes the data‐mining methodologies applied to the Genetic Analysis Workshop 13 data sets and the results of those investigations. Genet Epidemiol 25 (Suppl. 1):S57–S63, 2003. © 2003 Wiley‐Liss, Inc.