Premium
The impact of segmentation on whole‐lung functional MRI quantification: Repeatability and reproducibility from multiple human observers and an artificial neural network
Author(s) -
Willers Corin,
Bauman Grzegorz,
Andermatt Simon,
Santini Francesco,
Sandkühler Robin,
Ramsey Kathryn A.,
Cattin Philippe C.,
Bieri Oliver,
Pusterla Orso,
Latzin Philipp
Publication year - 2021
Publication title -
magnetic resonance in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.696
H-Index - 225
eISSN - 1522-2594
pISSN - 0740-3194
DOI - 10.1002/mrm.28476
Subject(s) - reproducibility , repeatability , intraclass correlation , nuclear medicine , medicine , coefficient of variation , magnetic resonance imaging , radiology , mathematics , statistics
Purpose To investigate the repeatability and reproducibility of lung segmentation and their impact on the quantitative outcomes from functional pulmonary MRI. Additionally, to validate an artificial neural network (ANN) to accelerate whole‐lung quantification. Method Ten healthy children and 25 children with cystic fibrosis underwent matrix pencil decomposition MRI (MP‐MRI). Impaired relative fractional ventilation (R FV ) and relative perfusion (R Q ) from MP‐MRI were compared using whole‐lung segmentation performed by a physician at two time‐points (A t1 and A t2 ), by an MRI technician (B), and by an ANN (C). Repeatability and reproducibility were assess with Dice similarity coefficient (DSC), paired t‐test and Intraclass‐correlation coefficient (ICC). Results The repeatability within an observer (A t1 vs A t2 ) resulted in a DSC of 0.94 ± 0.01 (mean ± SD) and an unsystematic difference of −0.01% for R FV ( P = .92) and +0.1% for R Q ( P = .21). The reproducibility between human observers (A t1 vs B) resulted in a DSC of 0.88 ± 0.02, and a systematic absolute difference of −0.81% ( P < .001) for R FV and −0.38% ( P = .037) for R Q . The reproducibility between human and the ANN (A t1 vs C) resulted in a DSC of 0.89 ± 0.03 and a systematic absolute difference of −0.36% for R FV ( P = .017) and −0.35% for R Q ( P = .002). The ICC was >0.98 for all variables and comparisons. Conclusions Despite high overall agreement, there were systematic differences in lung segmentation between observers. This needs to be considered for longitudinal studies and could be overcome by using an ANN, which performs as good as human observers and fully automatizes MP‐MRI post‐processing.