Premium
Building and preprocessing of image data using indices of representativeness and classification applied to granular product characterization
Author(s) -
Ros F.,
Guillaume S.,
BellonMaurel V.,
Bertrand D.
Publication year - 1997
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/(sici)1099-128x(199711/12)11:6<469::aid-cem491>3.0.co;2-q
Subject(s) - representativeness heuristic , preprocessor , computer science , sample (material) , data mining , pattern recognition (psychology) , histogram , image (mathematics) , data pre processing , bottleneck , artificial intelligence , sample size determination , image processing , mathematics , statistics , chemistry , chromatography , embedded system
The characterization of granular products using image analysis is complex, as defining sample size is a very difficult task (should one use weight or number of particles?) and because of the diversity of the data which can be extracted from the image. A three‐step procedure is applied: data extraction, data preprocessing and sample classification. We deal with the second step, once the image data have been extracted and gathered into histograms with a large number of intervals. The method we propose allows both the building of optimal size samples and the creation of data vectors appropriate for the third step. The originality of the method lies in the supervision of the data processing by taking into account the final goal, the discrimination into classes. Indices of stability and discrimination are created to build new histograms. To determine the optimal sample size, indices of representativeness and classification are used. This process has been tested on mill product images which are divided into three classes. The optimal sample size given by the representativeness index is 18 images, whereas it drops to 13 using the clasification index. For this example the features, if considered independently, are not informative enough to solve the problem (the best classification performance is 60%). It is necessary to develop a strategy where features are combined. This strategy is presented in a separate paper. © 1997 John Wiley & Sons, Ltd.