
Data Extraction and Scratching Information Using R
Author(s) -
G Midhu Bala,
K. Chitra
Publication year - 2021
Publication title -
shanlax international journal of arts, science and humanities (online)
Language(s) - English
Resource type - Journals
ISSN - 2582-0397
DOI - 10.34293/sijash.v8i3.3588
Subject(s) - computer science , web intelligence , semantic web , document object model , web page , field (mathematics) , world wide web , information extraction , information retrieval , process (computing) , analytics , semantic web stack , web modeling , data mining , programming language , mathematics , pure mathematics
Web scraping is the process of automatically extracting multiple WebPages from the World Wide Web. It is a field with active developments that shares a common goal with text processing, the semantic web vision, semantic understanding, machine learning, artificial intelligence and human- computer interactions. Current web scraping solutions range from requiring human effort, the ad-hoc, and to fully automated systems that are able to extract the required unstructured information, convert into structured information, with limitations. This paper describes a method for developing a web scraper using R programming that locates files on a website and then extracts the filtered data and stores it. The modules used and the algorithm of automating the navigation of a website via links are mentioned in this paper. Further it can be used for data analytics.