z-logo
Premium
SOF: a semi‐supervised ontology‐learning‐based focused crawler
Author(s) -
Dong Hai,
Hussain Farookh Khadeer
Publication year - 2012
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.2980
Subject(s) - web crawler , computer science , ontology , focused crawler , world wide web , information retrieval , semantic web , ontology learning , ontology based data integration , upper ontology , owl s , ontology inference layer , web page , social semantic web , suggested upper merged ontology , web development , static web page , philosophy , epistemology
SUMMARY The rapid increase in the volume of data available on the Internet makes it increasingly impractical for a crawler to index the whole Web. Instead, many intelligent crawlers, known as ontology‐based semantic focused crawlers, have been designed by making use of Semantic Web technologies for topic‐centered Web information crawling. Ontologies, however, have constraints of validity and time, which may influence the performance of the crawlers. Ontology‐learning‐based focused crawlers are therefore designed to automatically evolve ontologies by integrating ontology learning technologies. Nevertheless, surveys indicate that the existing ontology‐learning‐based focused crawlers do not have the capability to automatically enrich the content of ontologies, which makes these crawlers unreliable in the open and heterogeneous Web environment. Hence, in this paper, we propose a framework for a novel semi‐supervised ontology‐learning‐based focused (SOF) crawler, the SOF crawler, which embodies a series of schemas for ontology generation and Web information formatting, a semi‐supervised ontology learning framework, and a hybrid Web page classification approach aggregated by a group of support vector machine models. A series of tests are implemented to evaluate the technical feasibility of this proposed framework. The conclusion and the future work are summarized in the final section. Copyright © 2012 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here