Premium
The effect of mislabeled samples on the performance of the linear learning machine
Author(s) -
Lavine Barry K.,
Ward Anthony J. I.,
Han Jian Hwa,
Smith RoyKeith,
Taylor Orley R.
Publication year - 1990
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.1180040106
Subject(s) - hyperplane , machine learning , artificial intelligence , computer science , class (philosophy) , feature (linguistics) , set (abstract data type) , training set , linear classifier , support vector machine , mathematics , linguistics , philosophy , geometry , programming language
Over the past 15 years the linear learning machine has been applied to a large number of chemical problems. The learning machine approach is conceptually simple and does not require knowledge about the statistical distribution of the data. However, there are problems associated with this approach. One problem which has not been investigated is the influence of mislabeled samples on the positioning of the hyperplane in feature space. If a few samples in a data set are incorrectly tagged prior to training (i.e. the samples are labeled as members of class 2 even though they are actually members of class 1), it is still possible using the linear learning machine to achieve a classification success rate of 100% for the training set. However, unfavorable results will be obtained for the prediction set. The magnitude of this effect and its potential implications regarding the proper use of the linear learning machine are discussed.