A framework for pathway knowledge driven prioritization in genome‐wide association studies | Zendy

Biswas Shrayashi | Zendy; Pal Soumen | Zendy; Majumder Partha P. | Zendy; Bhattacharjee Samsiddhi | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

A framework for pathway knowledge driven prioritization in genome‐wide association studies

Author(s) -

Biswas Shrayashi,

Pal Soumen,

Majumder Partha P.,

Bhattacharjee Samsiddhi

Publication year - 2020

Publication title -

genetic epidemiology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.301

H-Index - 98

eISSN - 1098-2272

pISSN - 0741-0395

DOI - 10.1002/gepi.22345

Subject(s) - genome wide association study , overfitting , computer science , prioritization , computational biology , genome , scalability , statistical power , genetic association , data mining , machine learning , gene , biology , genetics , single nucleotide polymorphism , mathematics , statistics , management science , database , genotype , artificial neural network , economics

Many variants with low frequencies or with low to modest effects likely remain unidentified in genome‐wide association studies (GWAS) because of stringent genome‐wide thresholds for detection. To improve the power of detection, variant prioritization based on their functional annotations and epigenetic landmarks has been used successfully. Here, we propose a novel method of prioritization of a GWAS by exploiting gene‐level knowledge (e.g., annotations to pathways and ontologies) and show that it further improves power. Often, disease associated variants are found near genes that are coinvolved in specific biological pathways relevant to disease process. Utilization of this knowledge to conduct a prioritized scan increases the power to detect loci that map to genes clustered in a few specific pathways. We have developed a computationally scalable framework based on penalized logistic regression (termed GKnowMTest — G enomic Know ledge‐guided M ultiplte Test ing) to enable a prioritized pathway‐guided GWAS scan with a very large number of gene‐level annotations. We demonstrate that the proposed strategy improves overall power and maintains the Type 1 error globally. Our method works on genome‐wide summary level data and a user‐specified list of pathways (e.g., those extracted from large pathway databases without reference to biology of a specific disease). It automatically reweights the input p values by incorporating the pathway enrichments as “adaptively learned” from the data using a cross‐validation technique to avoid overfitting. We used whole‐genome simulations and some publicly available GWAS data sets to illustrate the application of our method. The GKnowMTest framework has been implemented as a user‐friendly open‐source R package.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research