Open Access
WEB SCALE INFORMATION EXTRACTION USING WRAPPER INDUCTION APPROACH
Author(s) -
Rina Zambad,
Jayant Gadge
Publication year - 2014
Publication title -
international journal of electrical and electronics engineering
Language(s) - English
Resource type - Journals
ISSN - 2231-5284
DOI - 10.47893/ijeee.2014.1121
Subject(s) - computer science , parsing , information extraction , information retrieval , unstructured data , scale (ratio) , data extraction , web page , data mining , database , natural language processing , world wide web , big data , physics , medline , quantum mechanics , political science , law
Information extraction from unstructured, ungrammatical data such as classified listings is difficult because traditional structural and grammatical extraction methods do not apply. The proposed architecture extracts unstructured and un-grammatical data using wrapper induction and show the result in structured format. The source of data will be collected from various post website. The obtained post data pages are processed by page parsing, cleansing and data extraction to obtain new reference sets. Reference sets are used for mapping the user search query, which improvised the scale of search on unstructured and ungrammatical post data. We validate our approach with experimental results.