Premium
Repeatability of radiomics and machine learning for DWI: Short‐term repeatability study of 112 patients with prostate cancer
Author(s) -
Merisaari Harri,
Taimen Pekka,
Shiradkar Rakesh,
Ettala Otto,
Pesola Marko,
Saunavaara Jani,
Boström Peter J.,
Madabhushi Anant,
Aronen Hannu J.,
Jambor Ivan
Publication year - 2020
Publication title -
magnetic resonance in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.696
H-Index - 225
eISSN - 1522-2594
pISSN - 0740-3194
DOI - 10.1002/mrm.28058
Subject(s) - repeatability , receiver operating characteristic , artificial intelligence , mathematics , pattern recognition (psychology) , wilcoxon signed rank test , computer science , nuclear medicine , medicine , statistics , mann–whitney u test
Purpose To evaluate repeatability of prostate DWI‐derived radiomics and machine learning methods for prostate cancer (PCa) characterization. Methods A total of 112 patients with diagnosed PCa underwent 2 prostate MRI examinations (Scan1 and Scan2) performed on the same day. DWI was performed using 12 b‐values (0–2000 s/mm 2 ), post‐processed using kurtosis function, and PCa areas were annotated using whole mount prostatectomy sections. A total of 1694 radiomic features including Sobel, Kirch, Gradient, Zernike Moments, Gabor, Haralick, CoLIAGe, Haar wavelet coefficients, 3D analogue to Laws features, 2D contours, and corner detectors were calculated. Radiomics and 4 feature pruning methods (area under the receiver operator characteristic curve, maximum relevance minimum redundancy, Spearman’s ρ, Wilcoxon rank‐sum) were evaluated in terms of Scan1‐Scan2 repeatability using intraclass correlation coefficient (ICC)(3,1). Classification performance for clinically significant and insignificant PCa with Gleason grade groups 1 versus >1 was evaluated by area under the receiver operator characteristic curve in unseen random 30% data split. Results The ICC(3,1) values for conventional radiomics and feature pruning methods were in the range of 0.28–0.90. The machine learning classifications varied between Scan1 and Scan2 with % of same class labels between Scan1 and Scan2 in the range of 61–81%. Surface‐to‐volume ratio and corner detector‐based features were among the most represented features with high repeatability, ICC(3,1) >0.75, consistently high ranking using all 4 feature pruning methods, and classification performance with area under the receiver operator characteristic curve >0.70. Conclusion Surface‐to‐volume ratio and corner detectors for prostate DWI led to good classification of unseen data and performed similarly in Scan1 and Scan2 in contrast to multiple conventional radiomic features.