
Nonidentifiability in the presence of factorization for truncated data
Author(s) -
Bella VakulenkoLagun,
Jing Qian,
ShyhHorng Chiou,
Rebecca A. Betensky
Publication year - 2019
Publication title -
biometrika
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.307
H-Index - 122
eISSN - 1464-3510
pISSN - 0006-3444
DOI - 10.1093/biomet/asz023
Subject(s) - factorization , mathematics , estimator , truncation (statistics) , censoring (clinical trials) , independence (probability theory) , conditional independence , distribution (mathematics) , observable , statistics , combinatorics , algorithm , mathematical analysis , physics , quantum mechanics
A time to event, [Formula: see text], is left-truncated by [Formula: see text] if [Formula: see text] can be observed only if [Formula: see text]. This often results in oversampling of large values of [Formula: see text], and necessitates adjustment of estimation procedures to avoid bias. Simple risk-set adjustments can be made to standard risk-set-based estimators to accommodate left truncation when [Formula: see text] and [Formula: see text] are quasi-independent. We derive a weaker factorization condition for the conditional distribution of [Formula: see text] given [Formula: see text] in the observable region that permits risk-set adjustment for estimation of the distribution of [Formula: see text], but not of the distribution of [Formula: see text]. Quasi-independence results when the analogous factorization condition for [Formula: see text] given [Formula: see text] holds also, in which case the distributions of [Formula: see text] and [Formula: see text] are easily estimated. While we can test for factorization, if the test does not reject, we cannot identify which factorization condition holds, or whether quasi-independence holds. Hence we require an unverifiable assumption in order to estimate the distribution of [Formula: see text] or [Formula: see text] based on truncated data. This contrasts with the common understanding that truncation is different from censoring in requiring no unverifiable assumptions for estimation. We illustrate these concepts through a simulation of left-truncated and right-censored data.