Premium
Best linear corrector of classification estimates of proportions of objects in several unknown classes
Author(s) -
Fortier JeanJ.
Publication year - 1992
Publication title -
canadian journal of statistics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.804
H-Index - 51
eISSN - 1708-945X
pISSN - 0319-5724
DOI - 10.2307/3315572
Subject(s) - mathematics , confusion matrix , confusion , transformation (genetics) , inverse , matrix (chemical analysis) , statistics , class (philosophy) , measure (data warehouse) , pattern recognition (psychology) , computer science , artificial intelligence , data mining , geometry , psychology , biochemistry , chemistry , materials science , psychoanalysis , composite material , gene
It has been recognized that counting the objects allocated by a rule of classification to several unknown classes often does not provide good estimates of the true class proportions of the objects to be classified. We propose a linear transformation of these classification estimates, which minimizes the mean squared error of the transformed estimates over all possible sets of true proportions. This so‐called best‐linear‐corrector (BLC) transformation is a function of the confusion (classification‐error) matrix and of the first and second moments of the prior distribution of the vector of proportions. When the number of objects to be classified increases, the BLC tends to the inverse of the confusion matrix. The estimates that are obtained directly by this inverse‐confusion corrector (ICC) are also the maximum‐likelihood unbiased estimates of the probabilities that the objects originate from one or the other class, had the objects been preselected with those probabilities. But for estimating the actual proportions, the ICC estimates behave less well than the raw classification estimates for some collections. In that situation, the BLC is substantially superior to the ICC even for some large collections of objects and is always substantially superior to the raw estimates. The statistical model is applied concretely to the measure of forest covers in remote sensing.