Premium
Selection of an optimal neural network architecture for computer‐aided detection of microcalcifications—Comparison of automated optimization techniques
Author(s) -
Gurcan Metin N.,
Sahiner Berkman,
Chan HeangPing,
Hadjiiski Lubomir,
Petrick Nicholas
Publication year - 2001
Publication title -
medical physics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.473
H-Index - 180
eISSN - 2473-4209
pISSN - 0094-2405
DOI - 10.1118/1.1395036
Subject(s) - simulated annealing , computer science , artificial neural network , artificial intelligence , pattern recognition (psychology) , kernel (algebra) , genetic algorithm , convolutional neural network , receiver operating characteristic , algorithm , machine learning , mathematics , combinatorics
Many computer‐aided diagnosis (CAD) systems use neural networks (NNs) for either detection or classification of abnormalities. Currently, most NNs are “optimized” by manual search in a very limited parameter space. In this work, we evaluated the use of automated optimization methods for selecting an optimal convolution neural network (CNN) architecture. Three automated methods, the steepest descent (SD), the simulated annealing (SA), and the genetic algorithm (GA), were compared. We used as an example the CNN that classifies true and false microcalcifications detected on digitized mammograms by a prescreening algorithm. Four parameters of the CNN architecture were considered for optimization, the numbers of node groups and the filter kernel sizes in the first and second hidden layers, resulting in a search space of 432 possible architectures. The area A zunder the receiver operating characteristic (ROC) curve was used to design a cost function. The SA experiments were conducted with four different annealing schedules. Three different parent selection methods were compared for the GA experiments. An available data set was split into two groups with approximately equal number of samples. By using the two groups alternately for training and testing, two different cost surfaces were evaluated. For the first cost surface, the SD method was trapped in a local minimum 91% (392/432) of the time. The SA using the Boltzman schedule selected the best architecture after evaluating, on average, 167 architectures. The GA achieved its best performance with linearly scaled roulette‐wheel parent selection; however, it evaluated 391 different architectures, on average, to find the best one. The second cost surface contained no local minimum. For this surface, a simple SD algorithm could quickly find the global minimum, but the SA with the very fast reannealing schedule was still the most efficient. The same SA scheme, however, was trapped in a local minimum on the first cost surface. Our CNN study demonstrated that, if optimization is to be performed on a cost surface whose characteristics are not known a priori , it is advisable that a moderately fast algorithm such as a SA using a Boltzman cooling schedule be used to conduct an efficient and thorough search, which may offer a better chance of reaching the global minimum.