Premium
Multiple testing in fMRI: An empirical case study on the balance between sensitivity, specificity, and stability
Author(s) -
Durnez Joke,
Roels Sanne P.,
Moerkerke Beatrijs
Publication year - 2014
Publication title -
biometrical journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.108
H-Index - 63
eISSN - 1521-4036
pISSN - 0323-3847
DOI - 10.1002/bimj.201200056
Subject(s) - bonferroni correction , false discovery rate , multiple comparisons problem , thresholding , sensitivity (control systems) , context (archaeology) , stability (learning theory) , type i and type ii errors , computer science , pattern recognition (psychology) , word error rate , artificial intelligence , statistics , statistical hypothesis testing , mathematics , machine learning , biology , paleontology , biochemistry , electronic engineering , gene , engineering , image (mathematics)
Functional Magnetic Resonance Imaging is a widespread technique in cognitive psychology that allows visualizing brain activation. The data analysis encompasses an enormous number of simultaneous statistical tests. Procedures that either control the familywise error rate or the false discovery rate have been applied to these data. These methods are mostly validated in terms of average sensitivity and specificity. However, procedures are not comparable if requirements on their error rates differ. Moreover, less attention has been given to the instability or variability of results. In a simulation study in the context of imaging, we first compare the Bonferroni and Benjamini–Hochberg procedures. Considering Bonferroni as a way to control the expected number of type I errors enables more lenient thresholding compared to familywise error rate control and a direct comparison between both procedures. We point out that while the same balance is obtained between average sensitivity and specificity, the Benjamini–Hochberg procedure appears less stable. Secondly, we have implemented the procedure of Gordon et al. ([Gordon, A., 2009]) (originally proposed for gene selection) that includes stability, measured through bootstrapping, in the decision criterion. Simulations indicate that the method attains the same balance between sensitivity and specificity. It improves the stability of Benjamini–Hochberg but does not outperform Bonferroni, making this computationally heavy bootstrap procedure less appealing. Third, we show how stability of thresholding procedures can be assessed using real data. In a dataset on face recognition, we again find that Bonferroni renders more stable results.