z-logo
open-access-imgOpen Access
An Optimal Bahadur-Efficient Method in Detection of Sparse Signals with Applications to Pathway Analysis in Sequencing Association Studies
Author(s) -
Hongying Dai,
Guodong Wu,
Michael C. Wu,
Degui Zhi
Publication year - 2016
Publication title -
plos one
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.99
H-Index - 332
ISSN - 1932-6203
DOI - 10.1371/journal.pone.0152667
Subject(s) - curse of dimensionality , sample size determination , computational biology , genetic association , computer science , false discovery rate , exome sequencing , kernel (algebra) , multiple comparisons problem , kernel density estimation , biology , mathematics , statistics , genetics , gene , artificial intelligence , genotype , mutation , single nucleotide polymorphism , combinatorics , estimator
Next-generation sequencing data pose a severe curse of dimensionality, complicating traditional "single marker—single trait" analysis. We propose a two-stage combined p-value method for pathway analysis. The first stage is at the gene level, where we integrate effects within a gene using the Sequence Kernel Association Test (SKAT). The second stage is at the pathway level, where we perform a correlated Lancaster procedure to detect joint effects from multiple genes within a pathway. We show that the Lancaster procedure is optimal in Bahadur efficiency among all combined p-value methods. The Bahadur efficiency,lim ε → 0N ( 2 ) / N ( 1 ) = ϕ 12 ( θ ), compares sample sizes among different statistical tests when signals become sparse in sequencing data, i.e. ε →0. The optimal Bahadur efficiency ensures that the Lancaster procedure asymptotically requires a minimal sample size to detect sparse signals (P N ( i )< ε → 0). The Lancaster procedure can also be applied to meta-analysis. Extensive empirical assessments of exome sequencing data show that the proposed method outperforms Gene Set Enrichment Analysis (GSEA). We applied the competitive Lancaster procedure to meta-analysis data generated by the Global Lipids Genetics Consortium to identify pathways significantly associated with high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, triglycerides, and total cholesterol.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here