Research on discovering deep web entries
Author(s) -
Ying Wang,
Huilai Li,
Wanli Zuo,
Fengling He,
Xin Wang,
Kerui Chen
Publication year - 2011
Publication title -
computer science and information systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.244
H-Index - 24
eISSN - 2406-1018
pISSN - 1820-0214
DOI - 10.2298/csis100322028w
Subject(s) - computer science , web crawler , information retrieval , classifier (uml) , crawling , web page , ontology , domain (mathematical analysis) , semantic web , deep web , world wide web , focused crawler , artificial intelligence , static web page , the internet , web navigation , medicine , mathematical analysis , philosophy , mathematics , epistemology , anatomy
Ontology plays an important role in locating Domain-Specific Deep Web contents, therefore, this paper presents a novel framework WFF for efficiently locating Domain-Specific Deep Web databases based on focused crawling and ontology by constructing Web Page Classifier(WPC), Form Structure Classifier(FSC) and Form Content Classifier(FCC) in a hierarchical fashion. Firstly, WPC discovers potentially interesting pages based on ontology-assisted focused crawler. Then, FSC analyzes the interesting pages and determines whether these pages subsume searchable forms based on structural characteristics. Lastly, FCC identifies searchable forms that belong to a given domain in the semantic level, and stores these URLs of Domain- Specific searchable forms to a database. Through a detailed experimental evaluation, WFF framework not only simplifies discovering process, but also effectively determines Domain-Specific databases.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom