A Simulation Study to Assess a Variable Selection Method for Selecting Single Nucleotide Polymorphisms Associated with Disease
Author(s) -
Huwaida Rabie,
Ian Saunders
Publication year - 2012
Publication title -
journal of computational biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.585
H-Index - 95
eISSN - 1557-8666
pISSN - 1066-5277
DOI - 10.1089/cmb.2011.0105
Subject(s) - single nucleotide polymorphism , snp , tag snp , selection (genetic algorithm) , feature selection , false positive paradox , genetic association , snp genotyping , genetics , biology , computational biology , computer science , statistics , genotype , mathematics , artificial intelligence , gene
In genome-wide association studies, where hundreds of thousands of single nucleotide polymorphisms (SNPs) are genotyped, the potential for false positives is high and methods for selecting models with only a few SNPs are required. Methods for variable selection giving sets of SNPs associated with disease have been developed, but are still less common than evaluation of individual SNPs one at a time. To assess the potential improvement available from multi-SNP approaches, we examined the performance of the software GeneRaVE as a variable selection method when applied to SNP data in case-control studies. The method was assessed via simulations, in which a haplotype identified by three SNPs was taken to be associated with the disease. Simulated data sets reflecting different levels and patterns of genetic association with the disease were generated. In order to have a baseline level of performance to assess the method against, we used a generalized linear model using only the three disease susceptibility SNPs to provide an upper bound on the possible performance of the selection methods. To investigate the advantage of using variable selection method as a multivariate method over a single SNP approach, we used chi-squared tests for each of the disease susceptibility (DS) SNPs with correction for multiple testing. Simulation results showed that GeneRaVE performed well and outperformed single SNP analysis using the chi-squared method in identifying disease-related SNPs. In application to a large dataset, it identified SNPs known to be associated with disease that were not identified by single SNP methods.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom