Premium
Sequence Analysis Using Logic Regression
Author(s) -
Kooperberg Charles,
Ruczinski Ingo,
LeBlanc Michael L.,
Hsu Li
Publication year - 2001
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.2001.21.s1.s626
Subject(s) - false positive paradox , permutation (music) , covariate , regression , regression analysis , single nucleotide polymorphism , multiple comparisons problem , computer science , statistics , genetics , biology , mathematics , gene , genotype , acoustics , physics
Logic Regression is a new adaptive regression methodology that attempts to construct predictors as Boolean combinations of (binary) covariates. In this paper we use this algorithm to deal with single‐nucleotide polymorphism (SNP) sequence data. The predictors that are found are interpretable as risk factors of the disease. Significance of these risk factors is assessed using techniques like cross‐validation, permutation tests, and independent test sets. These model selection techniques remain valid when data is dependent, as is the case for the family data used here. In our analysis of the Genetic Analysis Workshop 12 data we identify the exact locations of mutations on gene 1 and gene 6 and a number of mutations on gene 2 that are associated with the affected status, without selecting any false positives. © 2001 Wiley‐Liss, Inc.