Premium
COMBINING INDIVIDUAL PARTICIPANT DATA AND SUMMARY STATISTICS FROM BOTH CONTINUOUSLY VALUED AND BINARY VARIABLES TO ESTIMATE REGRESSION PARAMETERS
Author(s) -
Gurrin Lyle C.,
Turkovic Lidija
Publication year - 2012
Publication title -
australian and new zealand journal of statistics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.434
H-Index - 41
eISSN - 1467-842X
pISSN - 1369-1473
DOI - 10.1111/j.1467-842x.2012.00647.x
Subject(s) - statistics , estimator , mathematics , context (archaeology) , sample size determination , outcome (game theory) , binary data , econometrics , standard error , standard deviation , population , binary number , medicine , paleontology , arithmetic , mathematical economics , environmental health , biology
Summary Recent research has extended standard methods for meta‐analysis to more general forms of evidence synthesis, where the aim is to combine different data types or data summaries that contain information about functions of multiple parameters to make inferences about the parameters of interest. We consider one such scenario in which the goal is to make inferences about the association between a primary binary exposure and continuously valued outcome in the context of several confounding exposures, and where the data are available in various different forms: individual participant data (IPD) with repeated measures, sample means that have been aggregated over strata, and binary data generated by thresholding the underlying continuously valued outcome measure. We show that an estimator of the population mean of a continuously valued outcome can be constructed using binary threshold data provided that a separate estimate of the outcome standard deviation is available. The results of a simulation study show that this estimator has negligible bias but is less efficient than the sample mean – the minimum variance ratio is based on a Taylor series expansion. Combining this estimator with sample means and IPD from different sources (such as a series of published studies) using both linear and probit regression does, however, improve the precision of estimation considerably by incorporating data that would otherwise have been excluded for being in the wrong format. We apply these methods to investigate the association between the G277S mutation in the transferrin gene and serum ferritin (iron) levels separately in pre‐ and post‐menopausal women based on data from three published studies.