Web unit mining
Author(s) -
Aixin Sun,
EePeng Lim
Publication year - 2003
Publication title -
singapore management university institutional knowledge (ink) (singapore management university)
Language(s) - English
Resource type - Conference proceedings
ISBN - 1-58113-723-0
DOI - 10.1145/956863.956885
Subject(s) - computer science , data web , web page , social semantic web , web modeling , web mining , semantic web stack , world wide web , information retrieval , web standards , semantic web , web development , web intelligence , web mapping , static web page
In web classification, most researchers assume that theobjects to classify are individual web pages from one ormore web sites. In practice, the assumption is toorestrictive since a web page itself may not alwayscorrespond to a concept instance of some semantic concept(or category) given to the classification task. In thispaper, we want to relax this assumption and allow aconcept instance to be represented by a subgraph of webpages or a set of web pages. We identify several newissues to be addressed when the assumption is removed, andformulate the web unit mining problem. We also propose aniterative web unit mining (iWUM) method that first findssubgraphs of web pages using some knowledge about web sitestructure. From these web subgraphs, web units areconstructed and classified into semantic concepts (orcategories) in an iterative manner. Our experiments usingthe WebKB dataset showed that iWUM improves the overallclassification performance and works very well on the morestructured parts of a web site.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom