A highly efficient design strategy for regression with outcome pooling | Zendy

Mitchell Emily M. | Zendy; Lyles Robert H. | Zendy; Manatunga Amita K. | Zendy; Perkins Neil J. | Zendy; Schisterman Enrique F. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

A highly efficient design strategy for regression with outcome pooling

Author(s) -

Mitchell Emily M.,

Lyles Robert H.,

Manatunga Amita K.,

Perkins Neil J.,

Schisterman Enrique F.

Publication year - 2014

Publication title -

statistics in medicine

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.996

H-Index - 183

eISSN - 1097-0258

pISSN - 0277-6715

DOI - 10.1002/sim.6305

Subject(s) - pooling , cluster analysis , computer science , regression , regression analysis , data mining , outcome (game theory) , sample size determination , selection (genetic algorithm) , statistics , linear regression , machine learning , artificial intelligence , mathematics , mathematical economics

The potential for research involving biospecimens can be hindered by the prohibitive cost of performing laboratory assays on individual samples. To mitigate this cost, strategies such as randomly selecting a portion of specimens for analysis or randomly pooling specimens prior to performing laboratory assays may be employed. These techniques, while effective in reducing cost, are often accompanied by a considerable loss of statistical efficiency. We propose a novel pooling strategy based on the k ‐means clustering algorithm to reduce laboratory costs while maintaining a high level of statistical efficiency when predictor variables are measured on all subjects, but the outcome of interest is assessed in pools. We perform simulations motivated by the BioCycle study to compare this k ‐means pooling strategy with current pooling and selection techniques under simple and multiple linear regression models. While all of the methods considered produce unbiased estimates and confidence intervals with appropriate coverage, pooling under k ‐means clustering provides the most precise estimates, closely approximating results from the full data and losing minimal precision as the total number of pools decreases. The benefits of k ‐means clustering evident in the simulation study are then applied to an analysis of the BioCycle dataset. In conclusion, when the number of lab tests is limited by budget, pooling specimens based on k ‐means clustering prior to performing lab assays can be an effective way to save money with minimal information loss in a regression setting. Copyright © 2014 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research