z-logo
open-access-imgOpen Access
Design of topic Web crawler based on improved PageRank algorithm
Author(s) -
Linxuan Yu,
Yeli Li,
Qingtao Zeng
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1754/1/012210
Subject(s) - web crawler , pagerank , computer science , focused crawler , relevance (law) , information retrieval , search engine , the internet , sorting , data mining , big data , similarity (geometry) , web search engine , crawling , link analysis , hyperlink , web page , world wide web , algorithm , static web page , web navigation , web search query , artificial intelligence , medicine , anatomy , political science , law , image (mathematics)
With the continuous development of network information technology, the network is filled with a large number of all kinds of unstructured data called big data. However, this data is not easily stored in a local database. People realize that it is essential to get useful information from the Internet efficiently. The effort to gather information by human hands has led to the emergence of web crawler technology. However, the existing search engines still have shortcomings in topic similarity judgment and web page sorting algorithm. Therefore, this paper applies PageRank algorithm to topic crawler, constructs a vertical search engine, and introduces topic relevance factor to suppress "topic drift" according to the shortcomings of PageRank algorithm.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here