Premium
A System‐Level Pathway‐Phenotype Association Analysis Using Synthetic Feature Random Forest
Author(s) -
Pan Qinxin,
Hu Ting,
Malley James D.,
Andrew Angeline S.,
Karagas Margaret R.,
Moore Jason H.
Publication year - 2014
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.21794
Subject(s) - computational biology , genome wide association study , epistasis , random forest , biological pathway , biology , phenotype , single nucleotide polymorphism , systems biology , genetic association , genetics , interpretability , gene , computer science , genotype , artificial intelligence , gene expression
As the cost of genome‐wide genotyping decreases, the number of genome‐wide association studies (GWAS) has increased considerably. However, the transition from GWAS findings to the underlying biology of various phenotypes remains challenging. As a result, due to its system‐level interpretability, pathway analysis has become a popular tool for gaining insights on the underlying biology from high‐throughput genetic association data. In pathway analyses, gene sets representing particular biological processes are tested for significant associations with a given phenotype. Most existing pathway analysis approaches rely on single‐marker statistics and assume that pathways are independent of each other. As biological systems are driven by complex biomolecular interactions, embracing the complex relationships between single‐nucleotide polymorphisms (SNPs) and pathways needs to be addressed. To incorporate the complexity of gene‐gene interactions and pathway‐pathway relationships, we propose a system‐level pathway analysis approach, synthetic feature random forest (SF‐RF), which is designed to detect pathway‐phenotype associations without making assumptions about the relationships among SNPs or pathways. In our approach, the genotypes of SNPs in a particular pathway are aggregated into a synthetic feature representing that pathway via Random Forest (RF). Multiple synthetic features are analyzed using RF simultaneously and the significance of a synthetic feature indicates the significance of the corresponding pathway. We further complement SF‐RF with pathway‐based Statistical Epistasis Network (SEN) analysis that evaluates interactions among pathways. By investigating the pathway SEN, we hope to gain additional insights into the genetic mechanisms contributing to the pathway‐phenotype association. We apply SF‐RF to a population‐based genetic study of bladder cancer and further investigate the mechanisms that help explain the pathway‐phenotype associations using SEN. The bladder cancer associated pathways we found are both consistent with existing biological knowledge and reveal novel and plausible hypotheses for future biological validations.