An XML based Web Crawler with Page Revisit Policy and Updation in Local Repository of Search Engine | Zendy

Jyoti Mor | Zendy; Dinesh Rai | Zendy; Naresh Kumar | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

An XML based Web Crawler with Page Revisit Policy and Updation in Local Repository of Search Engine

Author(s) -

Jyoti Mor,

Dinesh Rai,

Naresh Kumar

Publication year - 2018

Publication title -

international journal of engineering and technology

Language(s) - English

Resource type - Journals

ISSN - 2227-524X

DOI - 10.14419/ijet.v7i3.12924

Subject(s) - web crawler , xml , computer science , information retrieval , world wide web , search engine , database

In a large collection of web pages, it is difficult for search engines to keep their online repository updated. Major search engines have hundreds of web crawlers that crawl the WWW day and night and send the downloaded web pages via a network to be stored in the search engine’s database. These results in over utilization of network resources like bandwidth, CPU cycles and so on. This paper proposes an architecture that tries to reduce the utilization of shared network resources with the help of an advanced XML based approach. This focused crawling based architecture is trained to download only the high quality data from the internet leaving behind the web pages which are not relevant to the desired domain. Here, a detailed layout of the proposed system is described which is capable of reducing the load on network and reducing the problem arise in residency of mobile agent at the remote server.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore