
Deep learning‐based research on the influence of training data size for breast cancer pathology detection
Author(s) -
Cui Chongyang,
Fan Shangchun,
Lei Han,
Qu Xiaolei,
Zheng Dezhi
Publication year - 2019
Publication title -
the journal of engineering
Language(s) - English
Resource type - Journals
ISSN - 2051-3305
DOI - 10.1049/joe.2018.9093
Subject(s) - computer science , breast cancer , economic shortage , workload , artificial intelligence , set (abstract data type) , training set , data set , training system , digital pathology , medical physics , cancer , pathology , pattern recognition (psychology) , machine learning , medicine , linguistics , philosophy , government (linguistics) , programming language , operating system , economics , economic growth
In pathological diagnosis of breast cancer, there are problems such as shortage of pathologists, difficulties in sample labeling, and huge workload of manual diagnosis. Therefore, deep learning‐based computer‐assisted pathology analysis systems have been developed to diagnose breast cancer and have achieved impressive results. However, it is difficult to obtain a large number of training sets due to the scarcity of pathological images and the huge labeling costs. Therefore, the size of the training set should be planned before building the pathology computer‐assisted breast cancer analysis system. Here, the authors present a study to determine the optimal size of the training data set needed to achieve high classification accuracy when developing a pathology computer‐assisted breast cancer analysis system. The authors trained two kind of CNNs using six different sizes of training data set and then tested the resulting system with a total of 10,000 images. All images were acquired from the Camelyon17 challenge. Here, the authors propose a scheme for determining the size of the training set and the size of the model in developing the pathology computer‐assisted breast cancer analysis systems, which can be easily applied to develop systems for other different pathological images.