Iterative cross‐training: An algorithm for learning from unlabeled Web pages | Zendy

Soonthornphisaj Nuanwan | Zendy; Kijsirikul Boonserm | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Iterative cross‐training: An algorithm for learning from unlabeled Web pages

Author(s) -

Soonthornphisaj Nuanwan,

Kijsirikul Boonserm

Publication year - 2004

Publication title -

international journal of intelligent systems

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.291

H-Index - 87

eISSN - 1098-111X

pISSN - 0884-8173

DOI - 10.1002/int.10157

Subject(s) - computer science , classifier (uml) , web page , machine learning , training set , artificial intelligence , naive bayes classifier , co training , semi supervised learning , labeled data , information retrieval , world wide web , support vector machine

The article presents a new learning method, called iterative cross ‐ training (ICT), for classifying Web pages in three classification problems, i.e., (1) classification of Thai/non‐Thai Web pages, (2) classification of course/non‐course home pages, and (3) classification of university‐related Web pages. Given domain knowledge or a small set of labeled data, our method combines two classifiers that are able to use effectively unlabeled examples to iteratively train each other. We compare ICT against the other learning methods: a supervised word segmentation classifier, a supervised naïve Bayes classifier, and a co–training‐style classifier. The experimental results on three classification problems show that ICT gives better performance than those of the other classifiers. One of the advantages of ICT is that it needs only a small set of prelabeled data or no prelabeled data in the case that domain knowledge is available. © 2004 Wiley Periodicals, Inc.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research