FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling | Zendy

Suparno Datta | Zendy; Jan Philipp Sachs | Zendy; Harry FreitasDa Cruz | Zendy; Tom Martensen | Zendy; Philipp Bode | Zendy; Ariane Morassi Sasso | Zendy; Benjamin S. Glicksberg | Zendy; Erwin P. Böttinger | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

FIBER: enabling flexible retrieval of electronic health records data for clinical predictive modeling

Author(s) -

Suparno Datta,

Jan Philipp Sachs,

Harry FreitasDa Cruz,

Tom Martensen,

Philipp Bode,

Ariane Morassi Sasso,

Benjamin S. Glicksberg,

Erwin P. Böttinger

Publication year - 2021

Publication title -

jamia open

Language(s) - English

Resource type - Journals

ISSN - 2574-2531

DOI - 10.1093/jamiaopen/ooab048

Subject(s) - computer science , python (programming language) , data warehouse , schema (genetic algorithms) , electronic health record , data mining , information retrieval , database , data science , health care , economics , economic growth , operating system

Objectives The development of clinical predictive models hinges upon the availability of comprehensive clinical data. Tapping into such resources requires considerable effort from clinicians, data scientists, and engineers. Specifically, these efforts are focused on data extraction and preprocessing steps required prior to modeling, including complex database queries. A handful of software libraries exist that can reduce this complexity by building upon data standards. However, a gap remains concerning electronic health records (EHRs) stored in star schema clinical data warehouses, an approach often adopted in practice. In this article, we introduce the FlexIBle EHR Retrieval (FIBER) tool: a Python library built on top of a star schema (i2b2) clinical data warehouse that enables flexible generation of modeling-ready cohorts as data frames. Materials and Methods FIBER was developed on top of a large-scale star schema EHR database which contains data from 8 million patients and over 120 million encounters. To illustrate FIBER’s capabilities, we present its application by building a heart surgery patient cohort with subsequent prediction of acute kidney injury (AKI) with various machine learning models. Results Using FIBER, we were able to build the heart surgery cohort ( n = 12 061), identify the patients that developed AKI ( n = 1005), and automatically extract relevant features ( n = 774). Finally, we trained machine learning models that achieved area under the curve values of up to 0.77 for this exemplary use case. Conclusion FIBER is an open-source Python library developed for extracting information from star schema clinical data warehouses and reduces time-to-modeling, helping to streamline the clinical modeling process.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research