z-logo
Premium
Bootstrap Aggregating of Alternating Decision Trees to Detect Sets of SNP s That Associate With Disease
Author(s) -
Guy Richard T.,
Santago Peter,
Langefeld Carl D.
Publication year - 2012
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.21608
Subject(s) - snp , decision tree , tree (set theory) , single nucleotide polymorphism , computer science , tag snp , logistic regression , genetic algorithm , machine learning , set (abstract data type) , decision tree learning , artificial intelligence , computational biology , data mining , mathematics , biology , genetics , combinatorics , genotype , gene , programming language
Complex genetic disorders are a result of a combination of genetic and nongenetic factors, all potentially interacting. Machine learning methods hold the potential to identify multilocus and environmental associations thought to drive complex genetic traits. Decision trees, a popular machine learning technique, offer a computationally low complexity algorithm capable of detecting associated sets of single nucleotide polymorphisms ( SNP s) of arbitrary size, including modern genome‐wide SNP scans. However, interpretation of the importance of an individual SNP within these trees can present challenges. We present a new decision tree algorithm denoted as Bagged Alternating Decision Trees (BADTrees) that is based on identifying common structural elements in a bootstrapped set of Alternating Decision Trees (ADTrees). The algorithm is order n k 2 , where n is the number of SNP s considered and k is the number of SNP s in the tree constructed. Our simulation study suggests that BAD Trees have higher power and lower type I error rates than ADT rees alone and comparable power with lower type I error rates compared to logistic regression. We illustrate the application of these data using simulated data as well as from the L upus L arge A ssociation Study 1 (7,822 SNP s in 3,548 individuals). Our results suggest that BADT rees hold promise as a low computational order algorithm for detecting complex combinations of SNP and environmental factors associated with disease.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here