Premium
Peer‐to‐peer distributed text classifier learning in PADMINI
Author(s) -
Zhu Xianshu,
Mahule Tushar,
Dutta Haimonti,
Arora Sugandha,
Kargupta Hillol,
Borne Kirk
Publication year - 2012
Publication title -
statistical analysis and data mining: the asa data science journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.381
H-Index - 33
eISSN - 1932-1872
pISSN - 1932-1864
DOI - 10.1002/sam.11155
Subject(s) - computer science , classifier (uml) , the internet , information retrieval , world wide web , asynchronous communication , newspaper , data mining , artificial intelligence , computer network , advertising , business
Popular Internet document repositories, such as online newspapers, digital libraries, and blogs store large amount of text and image data that are frequently accessed by large number of users. Users' input through collaborative commenting or tagging can be very useful in organizing and classifying documents. Some web sites (e.g. Google Image Labeler) support a collection of tags and labels, but a large fraction of these sites do not currently support such activities. Moreover, relying upon centrally controlled web‐service providers for such support is probably not a good idea if the objective is to make the collaborative inputs publicly available. Often, business entities offering such web‐based tagging environments end up owning and monetizing the result of the collective effort. This paper takes a step toward addressing this problem—it proposes a peer‐to‐peer (P2P) system (PADMINI), powered by distributed data mining algorithms. In particular, it focuses on learning a P2P classifier from tagged text data. This paper describes the PADMINI system and the distributed text classifier learning components; text classification is posed as a linear program and an asynchronous distributed algorithm is used to solve it. It also presents extensive empirical results on text data obtained from the Hubble Space Telescope (HST) proposal abstract database. Copyright © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2012