Towards Extended Data Mining: An Examination of Technical Aspects
Author(s) -
Lakshmi Prasanna Kaspa,
Venkata Naga Sai Sriram Akella,
Zhengxin Chen,
Yong Shi
Publication year - 2018
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2018.10.216
Subject(s) - computer science , crawling , web mining , data mining , data science , data stream mining , set (abstract data type) , concept mining , web crawler , knowledge extraction , information retrieval , world wide web , web page , medicine , anatomy , programming language
Data mining has been an active research area for a couple of decades, yet the complicated nature of data mining is still not fully understood. One common misunderstanding of data mining is: Give me the data set, and data mining tools will show me the hidden knowledge. However, this thinking is quite naive, and is not realistic in many real world applications. In this paper, we explore extended data mining, which has the ultimate goal of automatically collecting additional data when needed for effective data mining. Existing web crawling and scraping techniques can be incorporated, but additional steps are still needed. In this paper, we examine important technical aspects for extended data mining via web crawling and scraping.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom