z-logo
Premium
Building text classifiers using positive, unlabeled and ‘outdated’ examples
Author(s) -
Han Jiayu,
Zuo Wanli,
Liu Lu,
Xu Yuanbo,
Peng Tao
Publication year - 2016
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.3879
Subject(s) - computer science , support vector machine , artificial intelligence , classifier (uml) , machine learning , voting , transfer of learning , weighted voting , pattern recognition (psychology) , data mining , politics , political science , law
Summary Learning from positive and unlabeled examples (PU learning) is a partially supervised classification that is frequently used in Web and text retrieval system. The merit of PU learning is that it can get good performance with less manual work. Motivated by transfer learning, this paper presents a novel method that transfers the ‘outdated data’ into the process of PU learning. We first propose a way to measure the strength of the features and select the strong features and the weak features according to the strength of the features. Then, we extract the reliable negative examples and the candidate negative examples using the strong and the weak features (Transfer‐1DNF). Finally, we construct a classifier called weighted voting iterative support vector machine (SVM) that is made up of several subclassifiers by applying SVM iteratively, and each subclassifier is assigned a weight in each iteration. We conduct the experiments on two datasets: 20 Newsgroups and Reuters‐21578, and compare our method with three baseline algorithms: positive example‐based learning, weighted voting classifier and SVM. The results show that our proposed method Transfer‐1DNF can extract more reliable negative examples with lower error rates, and our classifier outperforms the baseline algorithms. Copyright © 2016 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here