Automatic Document Organization in a P2P Environment
Author(s) -
Stefan Siersdorfer,
Sergej Sizov
Publication year - 2006
Publication title -
lecture notes in computer science
Language(s) - English
Resource type - Book series
SCImago Journal Rank - 0.249
H-Index - 400
eISSN - 1611-3349
pISSN - 0302-9743
ISBN - 3-540-33347-9
DOI - 10.1007/11735106_24
Subject(s) - computer science , construct (python library) , crawling , generalization , machine learning , cluster analysis , context (archaeology) , artificial intelligence , peer to peer , data mining , world wide web , medicine , mathematical analysis , paleontology , mathematics , biology , anatomy , programming language
This paper describes an efficient method to construct reliable machine learning applications in peer-to-peer (P2P) networks by building ensemble based meta methods. We consider this problem in the context of distributed Web exploration applications like focused crawling. Typical applications are user-specific classification of retrieved Web contents into personalized topic hierarchies as well as automatic refinements of such taxonomies using unsupervised machine learning methods (e.g. clustering). Our approach is to combine models from multiple peers and to construct the advanced decision model that takes the generalization performance of multiple ‘local' peer models into account. In addition, meta algorithms can be applied in a restrictive manner, i.e. by leaving out some ‘uncertain' documents. The results of our systematic evaluation show the viability of the proposed approach.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom