Premium
Analysis of multiple SNPs in genetic association studies: comparison of three multi‐locus methods to prioritize and select SNPs
Author(s) -
Heidema A. Geert,
Feskens Edith J.M.,
Doevendans Pieter A.F.M.,
Ruven Henk J.T.,
van Houwelingen Hans C.,
Mariman Edwin C.M.,
Boer Jolanda M.A.
Publication year - 2007
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.20251
Subject(s) - single nucleotide polymorphism , multifactor dimensionality reduction , genome wide association study , random forest , genetic association , snp , methylenetetrahydrofolate reductase , biology , population , genetics , computational biology , statistics , mathematics , allele , computer science , medicine , genotype , artificial intelligence , gene , environmental health
Nonparametric approaches have been developed that are able to analyze large numbers of single nucleotide polymorphisms (SNPs) in modest sample sizes. These approaches have different selection features and may not provide similar results when applied to the same dataset. Therefore, we compared the results of three approaches (set association, random forests and multifactor dimensionality reduction [MDR]) to select from a total of 93 candidate SNPs a subset of SNPs that are important in determining high‐density lipoprotein (HDL)‐cholesterol levels. The study population consisted of a random sample from a Dutch monitoring project for cardiovascular disease risk factors and was dichotomized into cases (low HDL‐cholesterol, n = 533) and non‐cases (high HDL‐cholesterol, n = 545) based on gender‐specific median values for HDL cholesterol. Clearly, all three approaches prioritized three SNPs as important (CETP Taq1B, CETP−629 C/A and LPL Ser447X). Two SNPs with weaker main effects were additionally prioritized by random forests (APOC3 3175 G/C and CCR2 Val62Ile), whereas MTHFR 677 C/T was selected in combination with CETP Taq1B as best model by MDR. Obtained p ‐values for the selected models were significant for the set association approach ( p =.0019), random forests ( p <.01) and MDR ( p <.02). In conclusion, the application of a combination of multi‐locus methods is a useful approach in genetic association studies to select a well‐defined set of important SNPs for further statistical and epidemiological interpretation, providing increased confidence and more information compared with the application of only one method. Genet. Epidemiol . 2007. © 2007 Wiley‐Liss, Inc.