Empirical null estimation using zero‐inflated discrete mixture distributions and its application to protein domain data | Zendy

Gauran Iris Ivy M. | Zendy; Park Junyong | Zendy; Lim Johan | Zendy; Park DoHwan | Zendy; Zylstra John | Zendy; Peterson Thomas | Zendy; Kann Maricel | Zendy; Spouge John L. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Empirical null estimation using zero‐inflated discrete mixture distributions and its application to protein domain data

Author(s) -

Gauran Iris Ivy M.,

Park Junyong,

Lim Johan,

Park DoHwan,

Zylstra John,

Peterson Thomas,

Kann Maricel,

Spouge John L.

Publication year - 2018

Publication title -

biometrics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 2.298

H-Index - 130

eISSN - 1541-0420

pISSN - 0006-341X

DOI - 10.1111/biom.12779

Subject(s) - mathematics , poisson distribution , inference , null hypothesis

Summary In recent mutation studies, analyses based on protein domain positions are gaining popularity over gene‐centric approaches since the latter have limitations in considering the functional context that the position of the mutation provides. This presents a large‐scale simultaneous inference problem, with hundreds of hypothesis tests to consider at the same time. This article aims to select significant mutation counts while controlling a given level of Type I error via False Discovery Rate (FDR) procedures. One main assumption is that the mutation counts follow a zero‐inflated model in order to account for the true zeros in the count model and the excess zeros. The class of models considered is the Zero‐inflated Generalized Poisson (ZIGP) distribution. Furthermore, we assumed that there exists a cut‐off value such that smaller counts than this value are generated from the null distribution. We present several data‐dependent methods to determine the cut‐off value. We also consider a two‐stage procedure based on screening process so that the number of mutations exceeding a certain value should be considered as significant mutations. Simulated and protein domain data sets are used to illustrate this procedure in estimation of the empirical null using a mixture of discrete distributions. Overall, while maintaining control of the FDR, the proposed two‐stage testing procedure has superior empirical power.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research