Optimized Extraction of Records from the Web Using Signal Processing and Machine Learning
Author(s) -
Roberto Panerai Velloso,
Carina F. Dorneles
Publication year - 2020
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5753/sbbd.2020.13629
Subject(s) - computer science , information extraction , extraction (chemistry) , signal (programming language) , upper and lower bounds , data mining , machine learning , artificial intelligence , programming language , mathematics , chemistry , chromatography , mathematical analysis
In this paper, we present an optimization of our previous record extraction approach from web pages. The proposed optimization improves the upper bound from O(nlogn) to O(n) while maintaining the same qualitative results as before (i.e., no loss in efficacy). We have achieved the following results: a 47% improvement in runtime efficiency when compared to our previous work and 95% f-score (same as our previous work).
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom