z-logo
open-access-imgOpen Access
A Systematic Review of Current Trends in Web Content Mining
Author(s) -
Makinde Opeyemi Samuel,
Afolabi Ibukun Tolulope,
Oladipupo Olufunke Oyejoke
Publication year - 2019
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1299/1/012040
Subject(s) - computer science , hyperlink , relevance (law) , ranking (information retrieval) , information retrieval , data science , web mining , web content , web page , object (grammar) , content analysis , world wide web , data extraction , data mining , artificial intelligence , social science , medline , sociology , political science , law
Knowledge in web documents, Relevance ranking of webpages and so on are some of the under-researched areas in web content mining (WCM). Apart from the general data mining tools used for knowledge discovery in web, there have been few attempts at reviewing WCM and these were from the perspective of the methods used and the problems solved but not in sufficient depth. This existing literature review attempts does not also reveal which problems have been under-researched and which application area has the most attention when it gets to WCM. The goal of this systematic review is to make available a comprehensive and semi-structured overview of WCM methods, problems and solutions proffered. To provide a comprehensive literature review on this subject, 57 publications which include journals, conferences proceeding, and workshops were considered between the periods of 1999-2018. The findings reveal that updating dynamic content, efficient content extraction, eliminating noise blocks etc remain the most prominent challenges associated with WCM with a very high attention on solving these problems in a more efficient manner. Also, most of the solutions proffered to the problems still come with their various limitations which make this area of research fertile for future research. Caching dynamic web data. With regard to content, the techniques used for content extraction in WCM consist of used Data Update Propagation (DUP), Association rule, Object Dependence Graphs, classification techniques, Document Object Model, Vision-Based Segmentation, Hyperlink-Induced Topic Search and so on. Finally, the study revealed that WCM has been mostly applied to general websites which include random webpages seeking to extract specific parameters. The review was able to identify the limitations of the current research on the subject matter and identify future research opportunities in WCM.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here