z-logo
open-access-imgOpen Access
A Method of Collecting Mongolian Web page Based on Hyperlink Correlation Degree
Author(s) -
Zhiqiang Ma,
Rui Yan,
Zeguang Zhang,
Shuangtao Yang
Publication year - 2015
Publication title -
international journal of control and automation
Language(s) - English
Resource type - Journals
eISSN - 2207-6387
pISSN - 2005-4297
DOI - 10.14257/ijca.2015.8.11.34
Subject(s) - hyperlink , degree (music) , computer science , correlation , information retrieval , web page , world wide web , mathematics , geometry , physics , acoustics
Since the encoding of Mongolian web pages is not unified and the amount of web pages are is fewer, a method to unify linguistic model and hyperlink analysis is designed to solve the problem. Firstly the web page language identification is carried on by the N-Gram language model, as well as the average distance of language identification is a part of the hyperlink correlation degree. Secondly the hyperlink correlation degree is calculated based on the anchor text, hyperlink increasing and hyperlink depth. Finally the hyperlinks which are sorted by the hyperlink correlation degree become the collecting seeds of the next web page. The experimental results show that the method of collecting Mongolian web page based on hyperlink correlation degree can effectively enhance the information sum, collection speed and the accuracy rate.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom