Active learning support vector machines for optimal sample selection in classification | Zendy

Zomer Simeone | Zendy; Del Nogal Sánchez Miguel | Zendy; Brereton Richard G. | Zendy; Pérez Pavón José L. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Active learning support vector machines for optimal sample selection in classification

Author(s) -

Zomer Simeone,

Del Nogal Sánchez Miguel,

Brereton Richard G.,

Pérez Pavón José L.

Publication year - 2004

Publication title -

journal of chemometrics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.47

H-Index - 92

eISSN - 1099-128X

pISSN - 0886-9383

DOI - 10.1002/cem.872

Subject(s) - labelling , support vector machine , classifier (uml) , computer science , artificial intelligence , pattern recognition (psychology) , machine learning , structured support vector machine , data mining , chemistry , biochemistry

Labelling samples is a procedure that may result in significant delays particularly when dealing with larger datasets and/or when labelling implies prolonged analysis. In such cases a strategy that allows the construction of a reliable classifier on the basis of a minimal sized training set by labelling a minor fraction of samples can be of advantage. Support vector machines (SVMs) are ideal for such an approach because the classifier relies on only a small subset of samples, namely the support vectors, while being independent from the remaining ones that typically form the majority of the dataset. This paper describes a procedure where a SVM classifier is constructed with support vectors systematically retrieved from the pool of unlabelled samples. The procedure is termed ‘active’ because the algorithm interacts with the samples prior to their labelling rather than waiting passively for the input. The learning behaviour on simulated datasets is analysed and a practical application for the detection of hydrocarbons in soils using mass spectrometry is described. Results on simulations show that the active learning SVM performs optimally on datasets where the classes display an intermediate level of separation. On the real case study the classifier correctly assesses the membership of all samples in the original dataset by requiring for labelling around 14% of the data. Its subsequent application on a second dataset of analogous nature also provides perfect classification without further labelling, giving the same outcome as most classical techniques based on the entirely labelled original dataset. Copyright © 2004 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research