Accelerated focused crawling through online relevance feedback
Author(s) -
Soumen Chakrabarti,
Kunal Punera,
Subramanyam Mallela
Publication year - 2002
Publication title -
citeseer x (the pennsylvania state university)
Language(s) - English
Resource type - Conference proceedings
ISBN - 1-58113-449-5
DOI - 10.1145/511446.511466
Subject(s) - crawling , computer science , relevance (law) , information retrieval , world wide web , frontier , web crawler , resource (disambiguation) , tree (set theory) , web page , relevance feedback , artificial intelligence , image retrieval , mathematics , medicine , political science , law , image (mathematics) , anatomy , history , computer network , mathematical analysis , archaeology
The organization of HTML into a tag tree structure, which is rendered by browsers as roughly rectangular regions with embedded text and HREF links, greatly helps surfers locate and click on links that best satisfy their information need. Can an automatic program emulate this human behavior and thereby learn to predict the relevance of an unseen HREF target page w.r.t. an information need, based on information limited to the HREF source page? Such a capability would be of great interest in focused crawling and resource discovery, because it can fine-tune the priority of unvisited URLs in the crawl frontier, and reduce the number of irrelevant pages which are fetched and discarded.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom