Premium
Optimal two‐phase sampling for estimating the area under the receiver operating characteristic curve
Author(s) -
Wu Yougui
Publication year - 2020
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.8819
Subject(s) - statistics , sampling (signal processing) , estimator , sample size determination , mathematics , variance reduction , sampling design , population , sample (material) , simple random sample , variance (accounting) , receiver operating characteristic , computer science , monte carlo method , medicine , filter (signal processing) , chemistry , environmental health , accounting , chromatography , business , computer vision
Statistical methods are well developed for estimating the area under the receiver operating characteristic curve (AUC) based on a random sample where the gold standard is available for every subject in the sample, or a two‐phase sample where the gold standard is ascertained only at the second phase for a subset of subjects sampled using fixed sampling probabilities. However, the methods based on a two‐phase sample do not attempt to optimize the sampling probabilities to minimize the variance of AUC estimator. In this paper, we consider the optimal two‐phase sampling design for evaluating the performance of an ordinal test in classifying disease status. We derived the analytic variance formula for the AUC estimator and used it to obtain the optimal sampling probabilities. The efficiency of the two‐phase sampling under the optimal sampling probabilities (OA) is evaluated by a simulation study, which indicates that two‐phase sampling under OA achieves a substantial amount of variance reduction with an over‐sample of subjects with low and high ordinal levels, compared with two‐phase sampling under proportional allocation (PA). Furthermore, in comparison with an one‐phase random sampling, two‐phase sampling under OA or PA have a clear advantage in reducing the variance of AUC estimator when the variance of diagnostic test results in the disease population is small relative to its counterpart in nondisease population. Finally, we applied the optimal two‐phase sampling design to a real‐world example to evaluate the performance of a questionnaire score in screening for childhood asthma.