z-logo
open-access-imgOpen Access
An XML based Web Crawler with Page Revisit Policy and Updation in Local Repository of Search Engine
Author(s) -
Jyoti Mor,
Dinesh Rai,
Naresh Kumar
Publication year - 2018
Publication title -
international journal of engineering and technology
Language(s) - English
Resource type - Journals
ISSN - 2227-524X
DOI - 10.14419/ijet.v7i3.12924
Subject(s) - web crawler , xml , computer science , information retrieval , world wide web , search engine , database
In a large collection of web pages, it is difficult for search engines to keep their online repository updated. Major search engines have hundreds of web crawlers that crawl the WWW day and night and send the downloaded web pages via a network to be stored in the search engine’s database. These results in over utilization of network resources like bandwidth, CPU cycles and so on. This paper proposes an architecture that tries to reduce the utilization of shared network resources with the help of an advanced XML based approach. This focused crawling based architecture is trained to download only the high quality data from the internet leaving behind the web pages which are not relevant to the desired domain. Here, a detailed layout of the proposed system is described which is capable of reducing the load on network and reducing the problem arise in residency of mobile agent at the remote server.  

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here