Design and Implementation of a Web Crawler System based on an Adaptive Page-Rank algorithm | Zendy

Xin Zhang | Zendy; Zhi Feng Cheng | Zendy; Chen Zhang | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Design and Implementation of a Web Crawler System based on an Adaptive Page-Rank algorithm

Author(s) -

Xin Zhang,

Zhi Feng Cheng,

Chen Zhang

Publication year - 2020

Publication title -

journal of physics. conference series

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.21

H-Index - 85

eISSN - 1742-6596

pISSN - 1742-6588

DOI - 10.1088/1742-6596/1634/1/012021

Subject(s) - web crawler , computer science , web page , focused crawler , information retrieval , crawling , search engine , static web page , precision and recall , python (programming language) , dynamic web page , rank (graph theory) , world wide web , algorithm , web navigation , programming language , mathematics , medicine , combinatorics , anatomy

Web crawlers have the ability to automatically extract web page information, but there exists the issue that some pages reuse keywords to improve their search rankings. Therefore, we propose an adaptive Page-rank algorithm to build a crawler system to resolve the issue mentioned above. Specifically, we generate a relationship matrix based on the crawled web page access relationships, and then an probability matrix based on the number of web pages is generated iteratively, and finally the web pages crawled are displayed in descending order of calculated weights. Besides, we propose to control the iterative process in Page-rank with the coherence of anchor texts. The system uses Python language to realize the functions of web crawling. Experimental results demonstrate that this system has a high speed in data collection. Comparing with Hints and classical Page-rank crawler systems, The results show that the proposed method outperforms in precision and recall.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore