Premium
Machine Learning discovery of lung disease trajectories in premature infants
Author(s) -
Ofman Gaston,
Caballero Mauricio,
Haiden Sadia,
Nowogrodski Florencia,
Hamvas Aaron,
Tipple Trent,
Kleeberger Steven,
AlvarezPaggi Damian,
Polack Fernando
Publication year - 2021
Publication title -
the faseb journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.709
H-Index - 277
eISSN - 1530-6860
pISSN - 0892-6638
DOI - 10.1096/fasebj.2021.35.s1.02359
Subject(s) - bronchopulmonary dysplasia , cohort , medicine , disease , pediatrics , fraction of inspired oxygen , cohort study , lung disease , a priori and a posteriori , machine learning , gestational age , artificial intelligence , computer science , lung , pregnancy , philosophy , genetics , epistemology , mechanical ventilation , biology
Background Bronchopulmonary dysplasia (BPD) affects 20‐40% of very low birth weight infants (VLBW) who endure short and long‐term limitations throughout their life. Experts have classified the disease based on oxygen and pressure exposure at different time points, on the continuum of disease progression, ignoring the rich respiratory and clinical history of infants up to this point, characterized by temporal fluctuations of oxygen requirements. As a consequence, most prediction models for BPD have shown to be of limited clinical utility missing strategic early opportunities to improve outcomes. Objective Our objective is to study premature lung disease endotypes by a data‐driven, hypothesis‐generating approach with the use of machine learning (ML) algorithms that can reliably correlate with comorbidities and outcomes. Design/Methods Clusters (endotypes) are constructed employing longitudinal data without any a priori classification such as the canonical labels “severe” or “mild” BPD. We used the fraction of inspired oxygen(FiO2) variable for the construction of trajectories. Then, these were clustered based on the anchored kmedoids algorithm. We leveraged two existing large longitudinal preterm cohorts. We explored our hypothesis using the Discovery‐BPD Program (DBPD) and validated our results with a second, independent data set from the PROP cohort. Of all VLBW infants in the DBPD cohort, the ones more than 29 weeks of gestation wereexcluded in the comparison analysis (DBPD‐P) to match the inclusion criteria of the PROP Cohort. Results The DBPD and PROP cohorts have comparable demographic characteristics. Unsupervised classification employing ML algorithms yields 3 distinct clusters. The individual trajectories are colored according to cluster membership as assigned by the unsupervised algorithm and then grouped by cluster (Fig 1). Comorbidities such as intra ventricular hemorrhage, sepsis and days of mechanical ventilation segregate better when analyzed by the ML clusters than with standard BPD categories. Additionally, first year of life re‐hospitalizations and death (Fig. 2) align better with ML clusters, thus further supporting cluster assignment as a superior categorization algorithm. Conclusion(s) Our preliminary results demonstrate an advantage of our ML model over the conventional BPD definition. We hope that this will enable a better understanding of the molecular basis underlying individual endotypes facilitating the development of better diagnostics and, eventually, personalized treatments for each endotype.