Premium
Empirical null estimation using zero‐inflated discrete mixture distributions and its application to protein domain data
Author(s) -
Gauran Iris Ivy M.,
Park Junyong,
Lim Johan,
Park DoHwan,
Zylstra John,
Peterson Thomas,
Kann Maricel,
Spouge John L.
Publication year - 2018
Publication title -
biometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.298
H-Index - 130
eISSN - 1541-0420
pISSN - 0006-341X
DOI - 10.1111/biom.12779
Subject(s) - mathematics , poisson distribution , inference , null hypothesis
Summary In recent mutation studies, analyses based on protein domain positions are gaining popularity over gene‐centric approaches since the latter have limitations in considering the functional context that the position of the mutation provides. This presents a large‐scale simultaneous inference problem, with hundreds of hypothesis tests to consider at the same time. This article aims to select significant mutation counts while controlling a given level of Type I error via False Discovery Rate (FDR) procedures. One main assumption is that the mutation counts follow a zero‐inflated model in order to account for the true zeros in the count model and the excess zeros. The class of models considered is the Zero‐inflated Generalized Poisson (ZIGP) distribution. Furthermore, we assumed that there exists a cut‐off value such that smaller counts than this value are generated from the null distribution. We present several data‐dependent methods to determine the cut‐off value. We also consider a two‐stage procedure based on screening process so that the number of mutations exceeding a certain value should be considered as significant mutations. Simulated and protein domain data sets are used to illustrate this procedure in estimation of the empirical null using a mixture of discrete distributions. Overall, while maintaining control of the FDR, the proposed two‐stage testing procedure has superior empirical power.