Comparison of machine learning methods for estimating case fatality ratios: An Ebola outbreak simulation study | Zendy

Alpha Forna | Zendy; Ilaria Dorigatti | Zendy; Pierre Nouvellet | Zendy; Christl A. Donnelly | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Comparison of machine learning methods for estimating case fatality ratios: An Ebola outbreak simulation study

Author(s) -

Alpha Forna,

Ilaria Dorigatti,

Pierre Nouvellet,

Christl A. Donnelly

Publication year - 2021

Publication title -

plos one

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.99

H-Index - 332

ISSN - 1932-6203

DOI - 10.1371/journal.pone.0257005

Subject(s) - missing data , imputation (statistics) , statistics , case fatality rate , context (archaeology) , random forest , computer science , outbreak , medicine , epidemiology , mathematics , machine learning , geography , virology , archaeology

Background Machine learning (ML) algorithms are now increasingly used in infectious disease epidemiology. Epidemiologists should understand how ML algorithms behave within the context of outbreak data where missingness of data is almost ubiquitous. Methods Using simulated data, we use a ML algorithmic framework to evaluate data imputation performance and the resulting case fatality ratio (CFR) estimates, focusing on the scale and type of data missingness (i.e., missing completely at random—MCAR, missing at random—MAR, or missing not at random—MNAR). Results Across ML methods, dataset sizes and proportions of training data used, the area under the receiver operating characteristic curve decreased by 7% (median, range: 1%–16%) when missingness was increased from 10% to 40%. Overall reduction in CFR bias for MAR across methods, proportion of missingness, outbreak size and proportion of training data was 0.5% (median, range: 0%–11%). Conclusion ML methods could reduce bias and increase the precision in CFR estimates at low levels of missingness. However, no method is robust to high percentages of missingness. Thus, a datacentric approach is recommended in outbreak settings—patient survival outcome data should be prioritised for collection and random-sample follow-ups should be implemented to ascertain missing outcomes.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research