z-logo
open-access-imgOpen Access
OWGC-HMC: An Online Web Genre Classification Model Based on Hierarchical Multilabel Classification
Author(s) -
Guozhong Dong,
Weizhe Zhang,
Rahul Yadav,
Xin Mu,
Zhili Zhou
Publication year - 2022
Publication title -
security and communication networks
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.446
H-Index - 43
eISSN - 1939-0114
pISSN - 1939-0122
DOI - 10.1155/2022/7549880
Subject(s) - computer science , information retrieval , crawling , web page , web mining , world wide web , the internet , artificial intelligence , medicine , anatomy
Web genre plays an important role in focused crawling, web link analysis, and contextual advertising. In this paper, web genre is defined as the functional purpose and the information type contained in the website. The intelligent classification of web genre can predict the content and functional type of website. However, there are several critical challenges to solve the web genre classification problem: lack of web genre classification dataset and efficient web genre classification mechanism. To improve web genre classification performance, we crawled Chinese websites of different web genres and converted crawled data into a hierarchical multilabel classification dataset. A website knowledge graph is constructed based on the relationship of website and meta tag features. Using entity features extracted from the knowledge graph, we propose an online web genre classification model based on hierarchical multilabel classification (OWGC-HMC) to mine the functional purpose of the corresponding website. Experimental results show that our OWGC-HMC model can mine hierarchical multilabel structure of web genre and outperform other web genre classification methods.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom