Automated Retraining Methods for Document Classification and Their Parameter Tuning | Zendy

Stefan Siersdorfer | Zendy; Gerhard Weikum | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Automated Retraining Methods for Document Classification and Their Parameter Tuning

Author(s) -

Stefan Siersdorfer,

Gerhard Weikum

Publication year - 2005

Publication title -

lecture notes in computer science

Language(s) - English

Resource type - Book series

SCImago Journal Rank - 0.249

H-Index - 400

eISSN - 1611-3349

pISSN - 0302-9743

ISBN - 3-540-30017-1

DOI - 10.1007/11581062_38

Subject(s) - retraining , computer science , classifier (uml) , artificial intelligence , crawling , machine learning , training set , information retrieval , data mining , pattern recognition (psychology) , medicine , business , anatomy , international trade

This paper addresses the problem of semi-supervised classification on document collections using retraining (also called self-training). A possible application is focused Web crawling which may start with very few, manually selected, training documents but can be enhanced by automatically adding initially unlabeled, positively classified Web pages for retraining. Such an approach is by itself not robust and faces tuning problems regarding parameters like the number of selected documents, the number of retraining iterations, and the ratio of positive and negative classified samples used for retraining. The paper develops methods for automatically tuning these parameters, based on predicting the leave-one-out error for a re-trained classifier and avoiding that the classifier is diluted by selecting too many or weak documents for retraining. Our experiments with three different datasets confirm the practical viability of the approach.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research