Detecting potential labeling errors in microarrays by data perturbation
Author(s) -
Andrea Malossini,
Enrico Blanzieri,
Raymond T. Ng
Publication year - 2006
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btl346
Subject(s) - computer science , classifier (uml) , computation , data mining , support vector machine , pattern recognition (psychology) , artificial intelligence , algorithm , machine learning
Classification is widely used in medical applications. However, the quality of the classifier depends critically on the accurate labeling of the training data. But for many medical applications, labeling a sample or grading a biopsy can be subjective. Existing studies confirm this phenomenon and show that even a very small number of mislabeled samples could deeply degrade the performance of the obtained classifier, particularly when the sample size is small. The problem we address in this paper is to develop a method for automatically detecting samples that are possibly mislabeled.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom