Premium
Prioritizing individual genetic variants after kernel machine testing using variable selection
Author(s) -
He Qianchuan,
Cai Tianxi,
Liu Yang,
Zhao Ni,
Harmon Quaker E.,
Almli Lynn M.,
Binder Elisabeth B.,
Engel Stephanie M.,
Ressler Kerry J.,
Conneely Karen N.,
Lin Xihong,
Wu Michael C.
Publication year - 2016
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.21993
Subject(s) - kernel (algebra) , snp , kernel method , computer science , selection (genetic algorithm) , single nucleotide polymorphism , feature selection , set (abstract data type) , artificial intelligence , machine learning , computational biology , data mining , biology , genetics , mathematics , support vector machine , genotype , gene , combinatorics , programming language
ABSTRACT Kernel machine learning methods, such as the SNP‐set kernel association test (SKAT), have been widely used to test associations between traits and genetic polymorphisms. In contrast to traditional single‐SNP analysis methods, these methods are designed to examine the joint effect of a set of related SNPs (such as a group of SNPs within a gene or a pathway) and are able to identify sets of SNPs that are associated with the trait of interest. However, as with many multi‐SNP testing approaches, kernel machine testing can draw conclusion only at the SNP‐set level, and does not directly inform on which one(s) of the identified SNP set is actually driving the associations. A recently proposed procedure, KerNel Iterative Feature Extraction (KNIFE), provides a general framework for incorporating variable selection into kernel machine methods. In this article, we focus on quantitative traits and relatively common SNPs, and adapt the KNIFE procedure to genetic association studies and propose an approach to identify driver SNPs after the application of SKAT to gene set analysis. Our approach accommodates several kernels that are widely used in SNP analysis, such as the linear kernel and the Identity by State (IBS) kernel. The proposed approach provides practically useful utilities to prioritize SNPs, and fills the gap between SNP set analysis and biological functional studies. Both simulation studies and real data application are used to demonstrate the proposed approach.