z-logo
Premium
Analyzing discrete competing risks data with partially overlapping or independent data sources and nonstandard sampling schemes, with application to cancer registries
Author(s) -
Lee Minjung,
Feuer Eric J.,
Wang Zhuoqiao,
Cho Hyunsoon,
Zou Zhaohui,
Hankey Benjamin F.,
Mariotto Angela B.,
Fine Jason P.
Publication year - 2019
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.8381
Subject(s) - computer science , covariate , data mining , hazard , sampling (signal processing) , statistics , variance (accounting) , flexibility (engineering) , sampling design , population , econometrics , mathematics , machine learning , medicine , chemistry , accounting , organic chemistry , filter (signal processing) , environmental health , business , computer vision
This paper demonstrates the flexibility of a general approach for the analysis of discrete time competing risks data that can accommodate complex data structures, different time scales for different causes, and nonstandard sampling schemes. The data may involve a single data source where all individuals contribute to analyses of both cause‐specific hazard functions, overlapping datasets where some individuals contribute to the analysis of the cause‐specific hazard function of only one cause while other individuals contribute to analyses of both cause‐specific hazard functions, or separate data sources where each individual contributes to the analysis of the cause‐specific hazard function of only a single cause. The approach is modularized into estimation and prediction. For the estimation step, the parameters and the variance‐covariance matrix can be estimated using widely available software. The prediction step utilizes a generic program with plug‐in estimates from the estimation step. The approach is illustrated with three prognostic models for stage IV male oral cancer using different data structures. The first model uses only men with stage IV oral cancer from population‐based registry data. The second model strategically extends the cohort to improve the efficiency of the estimates. The third model improves the accuracy for those with a lower risk of other causes of death, by bringing in an independent data source collected under a complex sampling design with additional other‐cause covariates. These analyses represent novel extensions of existing methodology, broadly applicable for the development of prognostic models capturing both the cancer and noncancer aspects of a patient's health.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here