Premium
Design and Inference for Cancer Biomarker Study with an Outcome and Auxiliary‐Dependent Subsampling
Author(s) -
Wang Xiaofei,
Zhou Haibo
Publication year - 2010
Publication title -
biometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.298
H-Index - 130
eISSN - 1541-0420
pISSN - 0006-341X
DOI - 10.1111/j.1541-0420.2009.01280.x
Subject(s) - biomarker , estimator , outcome (game theory) , computer science , covariate , inference , statistics , sampling (signal processing) , sample (material) , oncology , medicine , mathematics , artificial intelligence , machine learning , biology , genetics , chemistry , mathematical economics , chromatography , filter (signal processing) , computer vision
Summary In cancer research, it is important to evaluate the performance of a biomarker (e.g., molecular, genetic, or imaging) that correlates patients' prognosis or predicts patients' response to treatment in a large prospective study. Due to overall budget constraint and high cost associated with bioassays, investigators often have to select a subset from all registered patients for biomarker assessment. To detect a potentially moderate association between the biomarker and the outcome, investigators need to decide how to select the subset of a fixed size such that the study efficiency can be enhanced. We show that, instead of drawing a simple random sample from the study cohort, greater efficiency can be achieved by allowing the selection probability to depend on the outcome and an auxiliary variable; we refer to such a sampling scheme as outcome and auxiliary‐dependent subsampling (OADS). This article is motivated by the need to analyze data from a lung cancer biomarker study that adopts the OADS design to assess epidermal growth factor receptor (EGFR) mutations as a predictive biomarker for whether a subject responds to a greater extent to EGFR inhibitor drugs. We propose an estimated maximum‐likelihood method that accommodates the OADS design and utilizes all observed information, especially those contained in the likelihood score of EGFR mutations (an auxiliary variable of EGFR mutations) that is available to all patients. We derive the asymptotic properties of the proposed estimator and evaluate its finite sample properties via simulation. We illustrate the proposed method with a data example.