Hierarchical Contaminated Web Page Classification Based on Meta Tag Denoising Disposal
Author(s) -
Xiang Song,
Yi Zhu,
Xuemei Zeng,
Xingshu Chen
Publication year - 2021
Publication title -
security and communication networks
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.446
H-Index - 43
eISSN - 1939-0114
pISSN - 1939-0122
DOI - 10.1155/2021/2470897
Subject(s) - computer science , web page , information retrieval , noise reduction , data mining , noise (video) , world wide web , artificial intelligence , image (mathematics)
Web page classification is critical for information retrieval. Most web page classification methods have the following two faults: (1) need to analyze based on the overall web page and (2) do not pay enough attention to the existence of noise information inside the web page, which will thus decrease the efficiency and classification performance, especially when classifying the contaminated web page. To solve these problems, this paper proposes a denoising disposal algorithm. We choose the top-down method for hierarchical classification to improve the prediction efficiency. The experimental results demonstrate that our method is about 7 times faster than the full-page method and achieves good classification results in most categories. The precision of 7 parent categories is all above 88% and is 24% higher than the other meta tag-based method on average.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom