
Data Augmentation and Transfer Learning to Improve Generalizability of an Automated Prostate Segmentation Model
Author(s) -
Thomas Sanford,
Ling Zhang,
Stephanie Harmon,
Jonathan Sackett,
Dong Yang,
Holger R. Roth,
Ziyue Xu,
Deepak Kesani,
Sherif Mehralivand,
Ronaldo Hueb Baroni,
Tristan Barrett,
Rossano Girometti,
Aytekin Oto,
Andrei S. Purysko,
Sheng Xu,
Peter A. Pinto,
Daguang Xu,
Bradford J. Wood,
Peter L. Choyke,
Barış Türkbey
Publication year - 2020
Publication title -
american journal of roentgenology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.294
H-Index - 196
eISSN - 1546-3141
pISSN - 0361-803X
DOI - 10.2214/ajr.19.22347
Subject(s) - generalizability theory , artificial intelligence , segmentation , medicine , transfer of learning , overfitting , sørensen–dice coefficient , similarity (geometry) , prostate , data set , computer science , machine learning , image segmentation , pattern recognition (psychology) , artificial neural network , statistics , image (mathematics) , mathematics , cancer
OBJECTIVE. Deep learning applications in radiology often suffer from overfitting, limiting generalization to external centers. The objective of this study was to develop a high-quality prostate segmentation model capable of maintaining a high degree of performance across multiple independent datasets using transfer learning and data augmentation. MATERIALS AND METHODS. A retrospective cohort of 648 patients who underwent prostate MRI between February 2015 and November 2018 at a single center was used for training and validation. A deep learning approach combining 2D and 3D architecture was used for training, which incorporated transfer learning. A data augmentation strategy was used that was specific to the deformations, intensity, and alterations in image quality seen on radiology images. Five independent datasets, four of which were from outside centers, were used for testing, which was conducted with and without fine-tuning of the original model. The Dice similarity coefficient was used to evaluate model performance. RESULTS. When prostate segmentation models utilizing transfer learning were applied to the internal validation cohort, the mean Dice similarity coefficient was 93.1 for whole prostate and 89.0 for transition zone segmentations. When the models were applied to multiple test set cohorts, the improvement in performance achieved using data augmentation alone was 2.2% for the whole prostate models and 3.0% for the transition zone segmentation models. However, the best test-set results were obtained with models fine-tuned on test center data with mean Dice similarity coefficients of 91.5 for whole prostate segmentation and 89.7 for transition zone segmentation. CONCLUSION. Transfer learning allowed for the development of a high-performing prostate segmentation model, and data augmentation and fine-tuning approaches improved performance of a prostate segmentation model when applied to datasets from external centers.