Application of machine learning algorithms to predict coronary artery calcification with a sibship‐based design | Zendy

Sun Yan V. | Zendy; Bielak Lawrence F. | Zendy; Peyser Patricia A. | Zendy; Turner Stephen T. | Zendy; Sheedy Patrick F. | Zendy; Boerwinkle Eric | Zendy; Kardia Sharon L.R. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Application of machine learning algorithms to predict coronary artery calcification with a sibship‐based design

Author(s) -

Sun Yan V.,

Bielak Lawrence F.,

Peyser Patricia A.,

Turner Stephen T.,

Sheedy Patrick F.,

Boerwinkle Eric,

Kardia Sharon L.R.

Publication year - 2008

Publication title -

genetic epidemiology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.301

H-Index - 98

eISSN - 1098-2272

pISSN - 0741-0395

DOI - 10.1002/gepi.20309

Subject(s) - random forest , single nucleotide polymorphism , percentile , body mass index , algorithm , medicine , homocysteine , artificial intelligence , machine learning , mathematics , computer science , statistics , biology , genetics , gene , genotype

As part of the Genetic Epidemiology Network of Arteriopathy study, hypertensive non‐Hispanic White sibships were screened using 471 single nucleotide polymorphisms (SNPs) to identify genes influencing coronary artery calcification (CAC) measured by computed tomography. Individuals with detectable CAC and CAC quantity ≥70th age‐ and sex‐specific percentile were classified as having a high CAC burden and compared to individuals with CAC quantity <70th percentile. Two sibs from each sibship were randomly chosen and divided into two data sets, each with 360 unrelated individuals. Within each data set, we applied two machine learning algorithms, Random Forests and RuleFit, to identify the best predictors of having high CAC burden among 17 risk factors and 471 SNPs. Using five‐fold cross‐validation, both methods had ∼70% sensitivity and ∼60% specificity. Prediction accuracies were significantly different from random predictions ( P ‐value<0.001) based on 1,000 permutation tests. Predictability of using 287 tagSNPs was as good as using all 471 SNPs. For Random Forests, among the top 50 predictors, the same eight tagSNPs and 15 risk factors were found in both data sets while eight tagSNPs and 12 risk factors were found in both data sets for RuleFit. Replicable effects of two tagSNPs (in genes GPR35 and NOS3 ) and 12 risk factors (age, body mass index, sex, serum glucose, high‐density lipoprotein cholesterol, systolic blood pressure, cholesterol, homocysteine, triglycerides, fibrinogen, Lp(a) and low‐density lipoprotein particle size) were identified by both methods. This study illustrates how machine learning methods can be used in sibships to identify important, replicable predictors of subclinical coronary atherosclerosis. Genet. Epidemiol . 2008. © 2008 Wiley‐Liss, Inc.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research