Premium
Complexity and bias in cross‐sectional data with binary disease outcome in observational studies
Author(s) -
Wang MeiCheng,
Yang Yuchen
Publication year - 2020
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.8812
Subject(s) - cross sectional study , population , observational study , outcome (game theory) , disease , statistics , inference , demography , medicine , computer science , mathematics , environmental health , mathematical economics , artificial intelligence , sociology
A cross sectional population is defined as a population of living individuals at the sampling or observational time. Cross‐sectionally sampled data with binary disease outcome are commonly analyzed in observational studies for identifying how covariates correlate with disease occurrence. It is generally understood that cross‐sectional binary outcome is not as informative as longitudinally collected time‐to‐event data, but there is insufficient understanding as to whether bias can possibly exist in cross‐sectional data and how the bias is related to the population risk of interest. As the progression of a disease typically involves both time and disease status, we consider how the binary disease outcome from the cross‐sectional population is connected to birth‐illness‐death process in the target population. We argue that the distribution of cross‐sectional binary outcome is different from the risk distribution from the target population and that bias would typically arise when using cross‐sectional data to draw inference for population risk. In general, the cross‐sectional risk probability is determined jointly by the population risk probability and the ratio of duration of diseased state to the duration of disease‐free state. Through explicit formulas we conclude that bias can almost never be avoided from cross‐sectional data. We present age‐specific risk probability (ARP) and argue that models based on ARP offers a compromised but still biased approach to understand the population risk. An analysis based on Alzheimer's disease data is presented to illustrate the ARP model and possible critiques for the analysis results.